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Description 

The present invention relates to a genomic DNA, more particularly, a genomic DNA encoding a polypeptide capable 
of inducing the production of interferon-7 (hereinafter abbreviated as "IFN-y") by immunocompetent cells, 
s The present inventors successfully isolated a polypeptide capable of inducing the production of IFN-y by immuno- 

competent cells and cloned a cDNA encoding the polypeptide, which is disclosed in Japanese Patent Kokai No. 
27,189/96 and 193,098/96. Because the present polypeptide possesses the properties of enhancing killer cells' cyto- 
toxicity and inducing killer cells' formation as well as inducing IFN-y, a useful biologically active protein, it is expected 
to be widely used as an agent for viral diseases, microbial diseases, tumors and/or immunopathies, etc. 
10 It is said that a polypeptide generated by a gene expression may be partially cleaved and/or glycosylated by 

processing with intracellular enzymes in human cells. A polypeptide to be used in therapeutic agents should be pref- 
erably processed similarly as in human cells, whereas human cell lines generally have a disadvantage of less producing 
the present polypeptide, as described in Japanese Patent Application No.269, 105/96. Therefore, recombinant DNA 
techniques should be applied to obtain the present polypeptide in a desired amount. To produce the polypeptide proc- 
15 essed similarly as in human cells using recombinant DNA techniques, mammalian cells should be used as the hosts. 

In view of foregoing, the first object of the present invention is to provide a DNA which efficiently expresses the 
polypeptide production when introduced into a mammalian host cell. 

The second object of the present invention is to provide a transformant into which the DNA is introduced. 

The third object of the present invention is to provide a process for preparing a polypeptide, using the transformant. 

20 

[Means to Attain the Object] 

The present inventors' energetic studies to attain the above objects succeeded in the finding that a genomic DNA 
encoding the present polypeptide efficiently expresses the polypeptide production when introduced into mammalian 
25 host cells. They found that the polypeptide thus obtained possessed significantly higher biological activities than that 
obtained by expressing a cDNA encoding the polypeptide in Escherichia coli 

The first object of the present invention is attained by a genomic DNA encoding a polypeptide with the amino acid 
sequence of SEQ ID NO:1 (where the symbol "Xaa" means "isoleucine" or "threonine") or its homologous one, which 
induces interferon-y production by immunocompetent cells. 
30 The second object of the present invention is attained by a transformant formed by introducing the genomic DNA 

into a mammalian host cell. 

The third object of the present invention is attained by a process for preparing a polypeptide, which comprises (a) 
culturing the transformant in a nutrient medium, and (b) collecting the polypeptide from the resultant culture. 

FIG.1 is a restriction map of a recombinant DNA containing a genomic DNA according to the present invention. 

35 Explanation of the symbols are as follows: The symbol "Hin dill" indicates a cleavage site by a restriction enzyme 

Hin dill, and the symbol "HulGIF" indicates a genomic DNA according to the present invention. 

The followings are the preferred embodiments according to the present invention. This invention is made based 
on the identification of a genomic DNA encoding the polypeptide with the amino acid sequence of SEQ ID NO:1 or its 
homologous one, and the finding that the genomic DNA efficiently expresses the polypeptide with high biological ac- 

40 tivities when introduced into mammalian host cells. The genomic DNA of the present invention usually contains two or 
more exons, at least one of which possesses a part of or the whole of the nucleotide sequence of SEQ ID NO:2. The 
wording "a part" includes a nucleotide and a sequential nucleotides consisting of two or more nucleotides in SEQ ID 
NO:2. Examples of the exons are SEQ ID NOs:3 and 4. Human genomic DNA may contain additional exons with SEQ 
ID NOs:5 to 7. Since the present genomic DNA is derived from a mammalian genomic DNA, it contains introns, as a 

45 distinctive feature in mammalian genomic DNAs. The present genomic DNA usually has two or more introns such as 
SEQIDNOs:8to 12. 

More particular examples of the present genomic DNA include DNAs with SEQ I D NOs: 1 3 and 1 4 or complementary 
sequences thereunto. The DNAs with SEQ ID NOs: 13 and 14 are substantially the same. The DNA with SEQ ID NO: 
14 contains coding regions for a leader peptide, consisting of the nucleotides 15,607th-1 5,685th, 17,057th-1 7,068th 

50 and 20,452nd-20,468th, coding regions for the present polypeptide, consisting of the nucleotides 20,469th-20,586th, 
21,921st-22,054th and 26 ,828th -27 ,046th, and regions as introns, consisting of the nucleotides 15,686th-1 7,056th, 
17,069-20,451 st, 20,587th-2 1,920th and 22,055th -26, 827th. The genomic DNA with SEQ ID N0:13 is suitable for 
expressing the polypeptide in mammalian host cells. 

Generally in this field, when artificially expressing a DNA encoding a polypeptide in a host, one or more nucleotides 

55 in a DNA may be replaced by different ones, and appropriate promoter(s) and/or enhancer(s) may be linked to the 
DNA to improve the expressing efficiency or the properties of the expressed polypeptide. The present genomic DNA 
can be altered similarly as above. Therefore, as far as not substantially changing in the biological activities of the 
expressed polypeptides, the present genomic DNA should include DNAs encoding functional equivalents of the 



2 



EP 0 816 499 A2 



polypeptide, formed as follows: One or more nucleotides in SEQ ID NOs:3 to 14 are replaced by different ones, the 
untranslated regions and/or the coding region for a leader peptide in the 5'- and/or 3' -termini of SEQ ID NOs:3, 4, 5, 
6, 7, 13 and 14 are deleted, and appropriate oligonucleotides are linked to either or both ends of SEQ ID NO:13. 
The present genomic DNA includes general DNAs which are derived from a genome containing the nucleotide 

s sequences as above, and it is not restricted to its sources or origins as far as it is once isolated from its original 
organisms. For example : the present genomic DNA can be obtained by chemically synthesizing based on SEQ ID 
NOs:2 to 14, or by isolating from a human genomic DNA. The isolation of the present genomic DNA from such a human 
genomic DNA comprises (a) isolating a genomic DNA from human cells by conventional methods, (b) screening the 
genomic DNA with probes or primers, which are chemically synthesized oligonucleotides with a part of or the whole 

10 of the nucleotide sequence of SEQ ID NO:2, and (c) collecting a DNA to which the probes or primers specifically 
hybridize. Once the present genomic DNA is obtained, it can be unlimitedly replicated by constructing a recombinant 
DNA with an autonomously replicable vector by conventional method and then introducing the recombinant DNA into 
an appropriate host such as a microorganism or an animal cell before culturing the transformant or by applying a PCR 
method. 

*5 The present genomic DNA is very useful in producing the polypeptide by recombinant DNA techniques since it 

efficiently expresses the polypeptide with high biological activities when introduced into mammalian host cells. The 
present invention further provides a process for preparing a polypeptide using a specific genomic DNA, comprising 
the steps of (a) culturing a transformant formed by introducing the present genomic DNA into mammalian host cells, 
and (b) collecting the polypeptide which induces IFN-y production by immunocompetent cells from the resultant culture. 

20 The following explains the process for preparing the polypeptide according to the present invention. The present 

genomic DNA is usually introduced into host cells in the form of a recombinant DNA. The recombinant DNA, comprising 
the present genomic DNA and an autonomously replicable vector, can be relatively easily prepared by conventional 
recombinant DNA techniques when the genomic DNA is available. The vectors, into which the present genomic DNA 
can be inserted, include plasmid vectors such as pcD, pcDL-SRa, pKY4, pCDM8, pCEV4 and pME18S. The autono- 

25 mously replicable vectors usually further contain appropriate nucleotide sequences for the expression of the present 
recombinant DNA in each host cell, which include sequences for promoters, enhancers, replication origins, transcription 
termination sites, splicing sequences and/or selective markers. Heat shock protein promoters or IFN-a promoters, as 
disclosed in Japanese Patent Kokai No.163,368/95 by the same applicant of this invention, enables to artificially reg- 
ulate the present genomic DNA expression by external stimuli. 

30 To insert the present genomic DNA into vectors, conventional methods used in this field can be arbitrarily used: 

Genes containing the present genomic DNA and autonomously replicable vectors are cleaved with restriction enzymes 
and/or ultrasonic, and the resultant DNA fragments and the resultant vector fragments are ligated. To cleave genes 
and vectors by restriction enzymes, which specifically act on nucleotides, more particularly, Acd, BamHI, Bgh\, BstX\, 
EccR\, H/'ndlll, A/ofl, Ps/1, Sad, Sali, Sma\, Spe\, Xba\, Xho\, etc., facilitate the ligation of the DNA fragments and the 

35 vector fragments. To ligate the DNA fragments and the vector fragments, they are, if necessary, first annealed, then 
treated with a DNA ligase in vivo or in vitro. The recombinant DNAs thus obtained can be unlimitedly replicated in hosts 
derived from microorganisms or animals. 

Any cells conventionally used as hosts in this field can be used as the host cells: Examples of such are epithelial, 
interstitial and hemopoietic cells, derived from human, monkey, mouse and hamster, more particularly, 3T3 cells, C1 27 

40 cells, CHO cells, CV-1 cells, COS cells, HeLa cells, MOP cells and their mutants. Cells which inherently produce the 
present polypeptide also can be used as the host cells: Example of such are human hemopoietic cells such as lym- 
phoblasts, lymphocytes, monoblasts, monocytes, myeloblasts, myelocytes, granulocytes and macrophages, and hu- 
man epithelial and interstitial cells derived from solid tumors such as pulmonary carcinoma, large bowel cancer and 
colon cancer. More particular examples of the latter hemopoietic cells are leukemia cell lines such as HBL-38 cells, 

45 HL-60 cells ATCC CCL240, K-562 cells ATCC CCL243, KG-1 cells ATCC CCL246, Mo cells ATCC CRL8066, THP-1 
cells ATCC TIB202, U-937 cells ATCC CRL1 593.2, described by J. Minowada et al. in "Cancer Research", Vol.10, pp. 
1 -18 (1988), derived from leukemias or lymphoma including myelogenous leukemias, promyelocytic leukemias, mono- 
cytic leukemias, adult T-cell leukemias and hairy cell leukemias, and their mutants. The present polypeptide-proces- 
sibility of these leukemia cell lines and their mutants is so distinguished that they can easily yield the polypeptide with 

50 higher biological activities when used as hosts. 

To introduce the present DNA into the hosts, conventional methods such as DEAE<iextran method, calcium phos- 
phate transfection method, electroporation method, lipofection method, microinjection method, and viral infection meth- 
od as using retrovirus, adenovirus, herpesvirus and vaccinia virus, can be used. The polypeptide-producing clones in 
the transformants can be selected by applying the colony hybridization method or by observing the polypeptide pro- 

55 duction after culturing the transformants in culture media. For example, the recombinant DNA techniques using mam- 
malian cells as hosts are detailed in "Jikken-lgaku-Bessatsu Saibo-Kogaku Handbook (The handbook for the cell en- 
gineering)" (1992), edited by Toshio KUROKI, Masaru TANIGUCHI and Mitsuo OSHIMURA, published by YODOSHA. 
CO., LTD., Tokyo, Japan, and *Jikken-lgaku-Bessatsu Biomanual Series 3 Idenshi Cloning Jikken-Ho(The experimen- 
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tal methods for the gene cloning)" (1 993), edited by Takahi YOKOTA and Ken-ichi ARAI, published by YODOSH A CO., 
LTD., Tokyo, Japan. 

The transformants thus obtained secrete the present polypeptide intracellular^ and/or extracellularly when cultured 
in culture media. As the culture media, conventional ones used for mammalian cells can be used. The culture media 
5 generally comprise (a) buffers as a base, (b) inorganic ions such as sodium ion, potassium ion, calcium ion, phosphoric 
ion and chloric ion, (c) micronutrients, carbon sources, nitrogen sources, amino acids and vitamins, which are added 
depending on the metabolic ability of the cells, and (d) sera, hormones, cell growth factors and cell adhesion factors, 
which are added if necessary. Examples of individual media include 199 medium, DMEM medium, Ham's F1 2 medium, 
IMDM medium, MCDB 104 medium, MCDB 153 medium, MEM medium, RD medium, RITC 80-7 medium, RPMI-1630 

10 medium, RPMI-1640 medium and WAJC 404 medium. The cultures containing the present polypeptide are obtainable 
by inoculating the transformants into the culture media to give a cell density of 1 x 10 4 - 1 x 10 7 cells/ml, more preferably, 
1 x 10 5 - 1 x 10 6 cells/ml, and then subjecting to suspension- or monolayer-cultures at about 37°C for 1-7 days, more 
preferably, 2-4 days, while appropriately replacing the culture media with a fresh preparation of the culture media. The 
cultures thus obtained usually contain the present polypeptide in a concentration of about 1 -100 u^g/ml, which may vary 

is depending on the types of the transformants or the culture conditions used. 

While the cultures thus obtained can be used intact as an IFN-y inducer, they are usually subjected to a step for 
separating the present polypeptide from the cells or the. cell debris using filtration, centrifugation, etc. before use, which 
may follow a step for disrupting the cells with supersonication, cell-lytic enzymes and/or detergents if desired, and to 
a step for purifying the polypeptide. The cultures from which the cells or cell debris are removed are usually subjected 

20 to conventional methods used in this field for purifying biologically active polypeptides, such as salting-out, dialysis, 
filtration, concentration, separator/ sedimentation, ion-exchange chromatography, gel filtration chromatography, ad- 
sorption chromatography, chromatofocusing, hydrophobic chromatography, reversed phase chromatography, affinity 
chromatography, gel electrophoresis and/or isoelectric focusing. The resultant purified polypeptide can be concentrated 
and/or lyophilized into liquids or solids depending on final uses. The monoclonal antibodies disclosed in Japanese 

25 Patent Kokai No.231 ,598/96 by the same applicant of this invention are extremely useful to purify the present polypep- 
tide. Immunoaffinity chromatography using monoclonal antibodies yields the present polypeptide in a relatively high 
purity at the lowest costs and labors. 

The polypeptide obtainable by the process according to the present invention exerts strong effects in the treatment 
and/or the prevention for IFN-y- and/or killer cell-susceptive diseases since it possesses the properties of enhancing 

30 killer cells' cytotoxicity and inducing killer cells' formation as well as inducing I FN ^y, a useful biologically active protein, 
as described above. The polypeptide according to the present invention has a high activity of inducing IFN^, and this 
enables a desired amount of IFN-y production with only a small amount. The polypeptide is so low toxic that it scarcely 
causes serious side effects even when administered in a relatively-high dose. Therefore, the polypeptide has an ad- 
vantage that it can readily induce IFN-y in a desired amount without strictly controlling the dosage. The uses as agents 

35 for susceptive diseases are detailed in Japanese Patent Application No.28,722/96 by the same applicant of this inven- 
tion. 

The present genomic DNA is also useful for so-called "gene therapy 8 . According to conventional gene therapy, 
the present DNA can be introduced into patients with IFN^y- and/or killer cell-susceptive diseases by directly injecting 
after the DNA is inserted into vectors derived from viruses such as retrovirus, adenovirus and adeno-associated virus 

to or is incorporated into cationic- or membrane fusible-liposomes, or by self -transplanting lymphocytes which are col- 
lected from patients before the DNA is introduced. In adoptive immunotherapy with gene therapy, the present DNA is 
introduced into effector cells similarly as in conventional gene therapy. This can enhance the cytotoxicity of the effector 
cells to tumor cells, resulting in improvement of the adoptive immunotherapy. In tumor vaccine therapy with gene 
therapy, tumor cells from patients, into which the present genomic DNA is introduced similarly as in conventional gene 

45 therapy, are self-transplanted after proliferated ex vivo up to give a desired cell number. The transplanted tumor cells 
act as vaccines in the patients to exert a strong antitumor immunity specifically to antigens. Thus, the present genomic 
DNA exhibits considerable effects in gene therapy for diseases including viral diseases, microbial diseases, malignant 
tumors and immunopathies. The general procedures for gene therapy are detailed in "Jikken-lgaku-Etessatsu Biman- 
ual UP Series Idenshichiryo-no-Kisogijutsu (Basic techniques for the gene therapy)" (1996), edited by Takashi ODA- 

50 JIMA, Izumi SAITO and Keiya OZAWA, published by YODOSHA CO., LTD., Tokyo, Japan. 

The following examples explain the present invention, and the techniques used therein are conventional ones used 
in this field: For example, the techniques are described in "Jikken-lgaku-Bessatsu Saibo-Kogaku Handbook{The hand- 
book for the cell engineering)", (1992), edited by Toshio KUROKI, Masaru TANIGUCHI and MitsuoOSHIMURA, pub- 
lished by YODOSHA CO., LTD., Tokyo, Japan, and "Jikken-lgaku-Bessatsu Biomanual Series 3 Idenshi Clonong 

55 Jikken-Ho (The experimental methods for the gene cloning)" (1993), edited by Takahi YOKOTA and Ken-ichi ARAI, 
published by YODOSHA CO., LTD., Tokyo, Japan. 
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Example 1 

Cloning genomic DNA and determination of nucleotide sequence 

5 Example 1-1 

Determination of partial nucleotide sequence 

Five ng of "PromoterFinder™ DNA Pvull LIBRARY", a human placental genomic DNA library commercialized by 
10 CLONTECH Laboratories, Inc., California, USA, 5 uJ of 10 x Tth PGR reaction solution, 2.2 uJ of 25 mM magnesium 
acetate, 4 u.l of 2.5 mM dNTP-mixed solution, one uJ of the mixed solution of 2 unit/ul rTth DNA polymerase XL and 
2.2 u.g/u.1 Tth Start Antibody in a ratio of 4:1 by volume, 10 pmol of an oligonucleotide with the nucleotide sequence of 
5'-CCATCCTAATACGACTCACTATAGGGC-3' as an adaptor primer, and 10 pmol of an oligonucleotide with the nucle- 
otide sequence of S'-TTCCTCTTCCCGAAGCTGTGTAGACTGC-S' as an anti-sense primer, which was chemically 
is synthesized based on the sequence of the nucleotides 88th-115th in SEQ ID NO:2, were mixed and volumed up to 50 
uJ with sterilized distilled water. After incubating at 94°C for one min, the mixture was subjected to 7 cycles of incubations 
at 94°C for 25 sec and at 72°C for 4 min, followed by 32 cycles of incubations at 94°C for 25 sec at 67°C for 4 min to 
perform PCR. 

The reaction mixture was diluted by 100 folds with sterilized distilled water. One uJ of the dilution, 5 jil of 10 x Tth 
20 PCR reaction solution, 2.2 uJ of 25 mM magnesium acetate, 4 uJ of 2.5 mM dNTP-mixed solution, one uJ of the mixed 
solution of 2 unit/uJ rTth DNA polymerase XL and 2.2 jag/jal Tth Start Antibody in a ratio of 4:1 by volume, 10 pmol of 
an oligonucleotide with the nucleotide sequence of 5'-CTATAGGGCACGCGTGGT-3' as a nested primer, and 10 pmol 
of an oligonucleotide with the nucleotide sequence of 5'-TTCCTCTTCCCGAAGCTGTGTAGACTGC-3' as an anti- 
sense primer, which was chemically synthesized similarly as above, were mixed and volumed up to 50 uJ with sterilized 
25 distilled water. After incubating at 94°C for one min, the mixture was subjected to 5 cycles of incubations at 94°C for 
25 sec and at 72°C for 4 min, followed by 22 cycles of incubations at 94°C for 25 sec and at 67°C for 4 min to perform 
PCR for amplifying a DNA fragment of the present genomic DNA. The genomic DNA library and reagents for PCR 
used above were mainly from "PromoterFinder™ DNA WALKING KITS", commercialized by CLONTECH Laboratories, 
Inc., California, USA 

30 An adequate amount of the PCR product thus obtained was mixed with 50 ng of n pT7 Blue(R)", a plasmid vector 

commercialized by Novagen, Inc., Wl, USA, and an adequate amount of T4 DNA ligase, and 100 mM ATP was added 
to give a final concentration of one mM, followed by incubating at 16°C for 18 hr to insert the DNA fragment into the 
plasmid vector. The obtained recombinant DNA was introduced into an Escherichia coli JM109 strain by the competent 
cell method to form a transformant, which was then inoculated into L-broth medium (pH 7.2) containing 50 u.g/ml amp- 

35 icillin and cultured at 37°C for 18 hr. The cells were isolated from the resulting culture, and then subjected to the 
conventional alkali-SDS method to collect a recombinant DNA. The dideoxy method analysis confirmed that the re- 
combinant DNA contained the DNA fragment with a sequence of the nucleotides 5,150th-6,709th in SEQ ID NO:14. 

Example 1-2 

40 

Determination of partial nucleotide sequence 

PCR was performed in the same conditions as the first PCR in Example 1-1, but an oligonucleotide with the nu- 
cleotide sequence of 5'-GTAAGTTTTCACCTTCCAACTGTAGAGTCC-3\ which was chemically synthesized based on 
45 the nucleotide sequence of the DNA fragment in Example 1 -1 , was used as an anti-sense primer. 

The reaction mixture was diluted by 100 folds with sterilized distilled water. One uJ of the dilution was placed into 
a reaction tube, and PCR was performed in the same conditions as used in the second PCR in Example 1 -1 to amplify 
another DNA fragment of the present genomic DNA, but an oligonucleotide with the nucleotide sequence of 5'-GGGAT- 
CAAGTAGTGATCAGAAGCAGCACAC-3', which was chemically synthesized based on the nucleotide sequence of 
50 the DNA fragment in Example 1 -1 , was used as an anti-sense primer. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1 -1 to obtain a recombinant DNA. 
The recombinant DNA was replicated in Escherichia co//betore being collected. The analysis of the collected recom- 
binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides lst-5,228th in SEQ ID 
NO: 14. 

55 
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Example 1-3 

Determination of partial nucleotide sequence 

5 0.5 u.g of a human placental genomic DNA, commercialized by CLONTECH Laboratories, Inc., California, USA, 5 

jil of 1 0 x PCR reaction solution, 8 jil of 2.5 mM dNTP-mixed solution, one jal of the mixed solution of 5 unit/jil "TAKARA 
LA Taq POLYMERASE" and 1 .1 u^g/uJ "TaqStart ANTIBODY' in a ratio of 1 :1 by volume, both of them are commercial- 
ized by Takara Syuzo Co., Tokyo, Japan, 1 0 pmol of an oligonucleotide with the nucleotide sequence of 5'-CCTGGCT- 
GCCAACTCTGGCTGCTAAAGCGG-3' as a sense primer, chemically synthesized based on a sequence of the nucle- 

10 otides 46th-75th in SEQ ID NO:2, and 10 pmol of an oligonucleotide with the nucleotide sequence of 5'-GTATTGT- 
CAATAAATTTCATTGCCACAAAGTTG-3' as an anti-sense primer, chemically synthesized based on a sequence of 
the nucleotides 210th-242nd in SEQ ID NO:2, were mixed and volumed up to 50 uJ with sterilized distilled water. After 
incubating at 94°C for one min, the mixture was subjected to 5 cycles of incubations at 98°C for 20 sec and at 68°C 
for 10 min, followed by 25 cycles of incubations at 98°C for 20 sec and 68°C for 10 min, with adding 5 sec in times to 

is every cycle, and finally incubated at 72°C for 10 min to amplify further DNA fragment of the present genomic DNA. 
The reagents for PCR used above were mainly from "TAKARA LA PCR KIT VERSION 2", commercialized by Takara 
Syuzo Co., Tokyo : Japan. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1 -1 to obtain a recombinant DNA. 
The recombinant DNA was replicated in Escherichia co//before being collected. The analysis of the collected recom- 
20 binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides 6,640th-1 5,671st in SEQ 
IDNO:14. 

Experiment 1-4 

25 Determination of partial nucleotide sequence 

PCR was performed in the same conditions as the PCR in Example 1 -3 to amplify further another DNA fragment 
of the present genomic DNA; but an oligonucleotide with the nucleotide sequence of 5'-AAGATGGCTGCTGAACCAG- 
TAGAAGACAATTGC-3', chemically synthesized based on a sequence of the nucleotide 175th-207th in SEQ ID NO: 

30 2, was used as a sense primer, an oligonucleotide with the nucleotide sequence of S'-TCCTTGGTCAATGAAGA- 
GAACTTGGTC-3', chemically synthesized based on a sequence of nucleotides 334th-360th in the SEQ ID NO:2, was 
used as an anti-sense primer, and after incubating at 98°C for 20 sec, the reaction mixture was subjected to 30 cycles 
of incubations at 98°C for 20 sec and at 68°C for 3 min, followed by incubating at 72°C for 10 min. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1-1 to obtain a recombinant DNA. 

35 The recombinant DNA was replicated in Escherichia coli before being collected. The analysis of the collected recom- 
binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides 15,604th-20,543rd in 
SEQIDNO:14. 

Example 1-5 

40 

Determination of partial nucleotide sequence 

PCR was performed in the same conditions as the PCR in Example 1-4 to amplify further another DNA fragment 
of the present genomic DNA, but an oligonucleotide with the nucleotide sequence of S'-CCTGGAATCAGATTACTTT- 

45 GGCAAGCTTGAATC-3', chemically synthesized based on the sequence of the nucleotide 273rd-305th in SEQ ID NO: 
2, was used as a sense primer, and an oligonucleotide with the nucleotide sequence of 5'-GGAAATAATTTTGTTCT- 
C AC AG GAG AG AGTTG -3' , chemically synthesized based on the sequence of nucleotides 500th-53tst in the SEQ ID 
NO:2, was used as an anti-sense primer. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1-1 to obtain a recombinant DNA. 

so The recombinant DNA was replicated in Escherichia coli before being collected. The analysis of the collected recom- 
binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides 20, 456th -22, 048th in 
SEQIDNO:14. 

Example 1 -6 

ss 

Determination of partial nucleotide sequence 

PCR was performed in the same conditions as the PCR in Example 1-4 to amplify further another DNA fragment 
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of the present genomic DNA, but an oligonucleotide with the nucleotide sequence of 5'-GCCAGCCTAG AGGTATGGCT- 
G TAAC TATCTC - 3' , chemically synthesized based on the sequence of the nucleotide 449th-479th in SEQ ID NO:2, 
was used as a sense primer, and an oligonucleotide with the nucleotide sequence of S'-GGCATGAAAl I I lAAT- 
AGCTAGTCTTCGTTTTG-3', chemically synthesized based on the sequence of nucleotides 745th-777th in the SEQ 
s ID NO:2, was used as an anti-sense primer. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1 -1 to obtain a recombinant DNA. 
The recombinant DNA was replicated in Escherichia coli before being collected. The analysis of the collected recom- 
binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides 21 J 996th-27,067th in 
SEQIDNO:14. 

10 

Example 1-7 

Determination of partial nucleotide sequence 

15 PGR was performed in the same conditions as the first PCR in Example 1 -2 to amplify further another DNA fragment 

in the present genomic DNA, but an oligonucleotide with the nucleotide sequence of 5'-GTGACATCATATTCTTTCA- 
GAGAAGTGTCC-3', chemically synthesized based on the sequence of the nucleotide 575th-604th in SEQ ID NO:2, 
was used as a sense primer. 

The reaction mixture was diluted by 100 folds with sterilized distilled water. One jil of the dilution was placed into 

20 a reaction tube, and PCR was performed in the same conditions as the second PCR in Example 1 -2 to amplify further 
another DNA fragment of the present genomic DNA, but an oligonucleotide with the sequence of 5'-GCAATTTGAATCT- 
TCATCATACGAAGGATAC-3', chemically synthesized based on a sequence of the nucleotides 624th-654th in SEQ 
ID NO:2, was used as a sense primer. 

The DNA fragment was inserted into the plasmid vector similarly as in Example 1 -1 to obtain a recombinant DNA. 

25 The recombinant DNA was replicated in Escherichia coli before being collected. The analysis of the collected recom- 
binant DNA confirmed that it contained the DNA fragment with a sequence of the nucleotides 26,91 4th -28, 994th in 
SEQ ID NO: 14. 

Example 1 -8 

30 

Determination of complete nucleotide sequence 

Comparing the nucleotide sequence of SEQ ID NO:2, which was proved to encode the present polypeptide, as 
disclosed in Japanese Patent Kokai No. 193,098/96 by the same applicant of this invention, with the partial nucleotide 
35 sequences identified in Examples 1-1 to 1-7, it was proved that the present genomic DNA contained the nucleotide 
sequence of SEQ ID NO:14. SEQ ID NO:14, consisting of 28,994 base pairs (bp), was extremely longer than the SEQ 
ID NO:2, consisting of only 471 bp. This suggested that SEQ ID NO:1 4 contained introns, a characteristic of eukalyotic 
cells. 

It was examined where partial nucleotide sequences of SEQ ID NO:2, i.e., exons, and the donor and acceptor 
40 sites in introns, respectively consisting of the nucleotides of GT and AG, located in SEQ ID NO:14. Consequently, it 
was proved that SEQ ID NO:14 contained at least 5 introns, which located in the order of SEQ ID NOs:10, 11 , 12, 8 
and 9 in the direction from the 5*- to the 3*-termini. Therefore, the sequences between the neighboring introns must be 
exons, which were thought to be located in the order of SEQ ID NOs:5, 6, 3, 4 and 7 in the direction from the 5'- to the 
3' -termini. It was also proved that SEQ ID NO:7 contained the 3'-untranslated region other than the exons. The features 
45 of the sequence elucidated as this are arranged in SEQ ID NO:14. 

As disclosed in Japanese Patent Application No. 269,105/96 by the same applicant of this invention, the present 
polypeptide is produced as a polypeptide with N-terminal amino acid of tyrosine other than methionine in human cells, 
which is observed in SEQ ID NO:1 . This suggests that the present genomic DNA contains a leader peptide region in 
the upstream of the 5*-terminus of the present polypeptide-encoding region. A sequence consisting of 36 amino acids 
50 encoded by the upstream of the nucleotides 20,469th-20,471st. which is the nucleotides of TAC, are described as a 
leader peptide in SEQ ID NO: 14. 

Example 2 

55 Preparation of recombinant DNA pBGHuGF for expression 

0.06 ng of the DNA fragment in Example 1 -4 in a concentration of 3 ng/50 uJ, 0.02 ng of the DNA fragment, obtained 
by the methods in Example 1-5, 5 uJ of 10 x LA PCR reaction solution, 8 uJ of 2.5 mM dNTP-mixed solution, one uJ of 
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the mixed solution of 5 unit/uJ TAKARA LA Taq polymerase and 1 . 1 pg/uJ TaqStart Antibody in a ratio of 1 :1 by volume, 
10 pmol of an oligonucleotide with the sequence of 5'-TCCGAAGCTTAAGATGGCTGCTGAACCAGTA-3' as a sense 
primer, chemically synthesized based on the nucleotide sequence of the DNA fragment in Example 1-4, and 10 pmol 
of an oligonucleotide with the nucleotide sequence of 5'-GGAAATAATTTTGTTCTCACAGGAGAGAGTTG-3* as an 
5 anti-sense primer, chemically synthesized based on the nucleotide sequence of the DNA fragment in Example 1-5, 
were mixed and volumed up to 50 uJ with sterilized distilled water. After incubating at 94°C for one min, the mixture 
was subjected to 5 cycles of incubations at 98°C for 20 sec and at 72°C for 7 min, followed by 25 cycles of incubations 
at 98 6 C for 20 sec and 68°C for 7 min to perform PCR. The reaction mixture was cleaved by restriction enzymes H/ndlM 
and Sph\ to obtain a DNA fragment of about 5,900 bp, with cleavage sites by Hindi II and Sph\ in its both termini. 
10 PCR was performed in the same condition as above, but 0.02 ng of the DNA fragment in Example 1 -5, 0.06 ng of 

the DNA fragment obtained in Example 1-6, an oligonucleotide with the nucleotide sequence of 5'-ATGTAGCG- 
G CCG CGG C ATG AAATTTTAATAG CTAG TC-3' as an anti-sense primer, chemically synthesized based on the nucle- 
otide sequence of the DNA fragment in Example 1-6, and an oligonucleotide with the sequence of 5*-CCTGGAATCA- 
GATTACTTTGGCAAGCTTGAATC-3 , as a sense primer, chemically synthesized based on the DNA fragment in Ex- 
's ample 1 -6, were used. The reaction mixture was cleaved by restriction enzymes Not\ and Sph\ to obtain a DNA fragment 
of about 5,600 bp, with cleavage sites by /vbfl and Spfi in its both termini. 

A plasmid vector "pRc/CMV", containing a cytomegalovirus promoter, commercialized by Invitrogen Corporation, 
San Diego, USA, was cleaved by restriction enzymes H/ndlll and Not\ to obtain a vector fragment of about 5,500 bp. 
The vector fragment was mixed with the above two DNA fragments of about 5,900 bp and 5,600 bp, and reacted with 
20 T4 DNA ligase to insert the two DNA fragments into the plasmid vector. An Escherichia co// JM109 strain was trans- 
formed with the obtained recombinant DNA, and the transformant with the plasmid vector was selected by the colony 
hybridization method. The selected recombinant DNA was named as "pBGHuGF". As shown in FIG.1, the present 
genomic DNA, with the nucleotide sequence of SEQ ID NO:13, was ligated in the downstream of the cleavage site by 
the restriction enzyme Hindlll in the recombinant DNA. 

25 

Example 3 

Preparation of transformant using CHO cell as host 

30 CHO-K1 cells ATCC CCL61 were inoculated into Ham's F12 medium (pH 7.2) containing 10 v/v % bovine fetal 

serum and proliferated by conventional manner. The proliferated cells were collected and washed with phosphate- 
buffered saline (hereinafter abbreviated as "PBS") followed by suspending in PBS to give a cell density of 1 x 1 0 7 cells/ 
ml. 

10 jig of the recombinant DNA pBGHuGF in Example 2 and 0.8 ml of the above cell suspension were placed in a 

35 cuvette and ice-chilled for 10 min. The cuvette was installed in "GENE PULSER", an electroporation device commer- 
cialized by Bio-Rad Laboratories Inc., Brussels, Belgium, and then pulsed once with an electric discharge. After pulsing, 
the cuvette was immediately took out and ice-chilled for 10 min. The cell suspension from the cuvette was inoculated 
into Ham's F12 medium (pH 7.2) containing 10 v/v % bovine fetal serum and cultured under an ambient condition of 
5 v/v % C0 2 at 37°C for 3 days. To the culture medium was added G-418 to give a final concentration of 400 u.g/ml, 

40 and the culturing was continued further 3 weeks under the same conditions. From abut 1 00 colonies formed, 48 colonies 
were selected, and a portion of each was inoculated into a well of culturing plates with Ham's F12 medium (pH7.2) 
containing 400 pig/ml G-418 and 10 v/v % bovine fetal serum and cultured similarly as above. Thereafter, to each well 
of the culturing plates was added 10 mM Tris-HCI buffer (pH 8.5) containing 5.1 mM magnesium chloride, 0.5 w/v % 
sodium deoxycholate, 1 w/v % NONIDET P-40, 10 \ig/m\ aprotinin and 0.1 w/v % SDS to lyse the cells. 

45 50 |il aliquot of the cell lysates was mixed with one ml of glycerol and incubated at 37°C for one hour, before the 

polypeptides in the cell lysates were separated by the SDS-polyacrylamlde gel electrophoresis. The separated polypep- 
tides were transferred to a nitrocellulose membrane in usual manner, and the membrane was soaked in the culture 
supernatant of the hybridoma H-1, disclosed in Japanese Patent Kokai No.231, 598/96 by the same applicant of this 
invention, followed by washing with 50 mM Tris-HCI buffer containing 0.05 v/v % TWEEN 20 to remove an excessive 

50 mount of the monoclonal antibody. Thereafter, the nitrocellulose membrane was soaked in PBS containing rabbit- 
derived anti-mouse immunoglobulin antibody for one hr, which was labeled with horseradish peroxidase, followed by 
washing 50 mM Tris-HCI buffer (pH 7.5) containing 0.05 v/v % TWEEN 20 and soaking in 50 mM Tris-HCI buffer (pH 
7.5) containing 0.005 v/v % hydrogen peroxide and 0.3 mg/ml diaminobenzidine to develop colorations. The clone, 
which highly produced the polypeptide, was selected based on the color development and named "BGHuGF". 

55 
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Example 4 

Production of polypeptide by transformant and its physicochemical properties 

5 The transformant BGHuGF in Experiment 3 was inoculated into Ham's F12 medium (pH 7.2) containing 400 u.g/ 

ml G-418 and 10 v/v % bovine fetal serum, and cultured under an ambient condition of 5 v/v % C0 2 at 37°C for one 
week. The proliferated cells were collected, washed with PBS, and then washing with 10-fold volumes of ice-chilled 
20 mM Hepes buffer (pH 7.4), containing 1 0 mM potassium chloride and 0.1 mM ethylendiaminetetraacetate bisodium 
salt, according to the method described in "Proceedings of The National Academy of The Sciences of The USA", vol. 

10 86, pp.5,227 -5,231 (1989), by M. J. Kostura et al. The cells thus obtained were allowed to stand in 3-fold volumes of 
a fresh preparation of the same buffer under an ice-chilling condition for 20 min and freezed at -80°C, succeeded by 
thawing to disrupt the cells. The resulting cells were centrifuged to collect the supernatant. 

In parallel, THP-1 cells ATCC TIB 202, derived from a human acute monocytic leukemia, was similarly cultured 
and disrupted. Supernatant, obtained by centrifuging the resulting cells, was mixed with the supernatant obtained from 

15 the transformant BGHuGF and incubated at 37°C for 3 hr to react. The reaction mixture was applied to a column with 
"DE AE-SEPH AROSE", a gel for ion-exchange chromatography, commercialized by Pharmacia LKB Biotechnology AB, 
Upsalla, Sweden, equilibrated with 10 mM phosphate buffer (pH 6.6) before use. After washing the column with 10 
mM phosphate buffer (pH 6.6), 10 mM phosphate buffer (pH 6.6) with a stepwise gradient of NaCI increasing from 0 
M to 0.5 M was fed to the column, and fractions eluted by about 0.2 M NaCI were collected. The fractions were dialyzed 

20 against 10 mM phosphate buffer (pH 6.8) before applied to a column with "DEAE 5PW", a gel for ion-exchange chro- 
matography, commercialized by TOSOH Corporation, Tokyo, Japan. To the column was fed 10 mM phosphate buffer 
(pH 6.8) with a linear gradient of NaCI increasing from 0 M to 0.5 M, and fractions eluted by about 0.2-0.3 M NaCI were 
collected. 

While the obtained fractions were pooled and dialyzed against PBS, a gel for immunoaffinity chromatography with 
25 the monoclonal antibody were prepared according to the method disclosed in Japanese Patent Kokai No.231 ,598/96 
by the same applicant of this invention. After the gel were charged into a plastic column and washed with PBS, the 
above dialyzed solution was applied to the column. To the column was fed 100 mM glycine-HCI buffer (pH 2.5), and 
the eluted fractions, which contained a polypeptide capable of inducing the production of IFN^y by immunocompetent 
cells, were collected. After the collected fractions were dialyzed against sterilized distilled water and concentrated with 
30 a membrane filtration, the resultant was lyophilized to obtain a purified solid polypeptide in a yield of about 15 mg/l- 
culture. 

Example for Reference 

35 Expression in Escherichia coli 

As disclosed in Japanese Patent Kokai No.1 93,098/96, a transformant pKHuGF which was obtained by introducing 
a cDNA with the nucleotide sequence of SEQ ID NO:2 into Escherichia coli as a host, was inoculated into L-broth 
medium containing 50 ng/ml ampicillin and cultured at 37°C for 1 8 hr under shaking conditions. The cells were collected 
40 by centrifuging the resulting culture, and then suspended in a mixture solution (pH 7.2) of 1 39 mM NaCI, 7 mM Nah^PC^ 
and 3 mM Na 2 HP0 4 , followed by supersonicating to disrupt the cells. After the cell disruptants were centrifuged, the 
supernatant was subjected to purifying steps similarly as in Example 4-1 to obtain a purified solid polypeptide in a yield 
of about 5 mg/l-culture. 

Comparing the yields of the polypeptides in Example for Reference and in Example 4-1 shows that the use of a 
45 transformant, which is formed by introducing a genomic DNA encoding the present polypeptide into a mammalian cell 
as a host, strongly elevates the yield of the polypeptide per culture. 

Example 4-2 

50 Physicochemical property of polypeptide 

Example 4-2(a) 

Biological activity 

55 

Blood were collected from a healthy donor by using a syringe containing heparin, and then diluted with 2-fold 
volume of serum-free RPMI-1640 medium (pH 7.4). The blood was overlaid on ficoll, commercialized by Pharmacia 
LKB Biotechnology AB, Upsalla, Sweden, and centrifuged to obtain lymphocytes, which were then washed with RPMI- 
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1640 medium containing 10 v/v % bovine fetal serum before being suspended in a fresh preparation of the same 
medium to give a cell density of 5 x 10 B cells/ml. 0.15 ml aliquots of the cell suspension was distributed into wells of 
micro plates with 96 wells. 

To the wells with the cells were distributed 0.05 ml aliquots of solutions of the polypeptide in Example 4-1 , diluted 
5 with RPMI-1640 medium (pH 7.4) containing 10 v/v % bovine fetal serum to give desired concentrations. 0.05 ml 
aliquots of fresh preparations of the same medium with 2.5 jig/ml concanavalin A were further added to the wells, 
before culturing in a 5 v/v % C0 2 incubator at 37°C for 24 hr. After the cultivation, 0.1 ml of the culture supernatant 
was collected from each well and examined on IFN-y by usual enzyme immunoassay. In parallel, a systems as a control 
using the polypeptide in Reference for that in Example 4-1 or using no polypeptide was treated similarly as above. The 
io results were in Table 1. IFN-y in Table 1 were expressed with international units (IU), calculated based on the IFN-y 
standard, Gg23-901 -530, obtained from the International Institute of Health, USA 



Table 1 



Sample of polypeptide 


IFN-y production (lU/ml) 


Example 4-2(a) 
Example for Reference 


3.4x10 5 I 
1.7 x 10 5 



Table 1 indicates that the lymphocytes as immunocompetent cells produce IFN^y by the action of the present 
20 polypeptide. 

It is more remarkable that the polypeptide in Example 4-1 could induce IFN-y production more than that in Example 
for Reference. Considering this and the difference in the yields of the polypeptides, described in Example for Reference, 
it can be presumed: Even if DNAs could be substantially equivalent in encoding the same amino acid sequence, not 
only the expressing efficiencies of the DNAs may differ, but the products expressed by them may significantly differ in 
25 their biological activities as a result of post-translational modifications by intracellular enzymes, depending on types of 
the DNAs and their hosts; (a) one type is used a transformant formed by introducing a DNA, which is a cDNA, into a 
microorganisms as a host, and (b) other type is used a transformant formed by introducing the present genomic DNA 
into a mammalian cell as a host. 

30 Example 4-2(b) 

Molecular weight 

SDS-polyacrylamide gel electrophoresis of the polypeptide in Example 4-1 in the presence of 2 w/v %dithiothreitol 
35 as a reducing agent, according to the method reported by U. K. Laemli et al., in "Nature", Vol.227, pp.680-685 (1970), 
exhibited a main band of a protein capable of inducing IFN^y in a position corresponding to a molecular weight of about 
1 8,000-1 9,500 daltons. The molecular weight makers used in the analysis were bovine serum albumin (67,000 daltons), 
ovalbumin (45,000 daltons), carbonic anhydrase (30,000 daltons), soy bean trypsin inhibitor (20,100 daltons) and a- 
lactoalbumin ( 1 4,000 daltons). 

40 

Example 4-2fc) 

N-Terminal amino acid sequence 

45 Conventional analysis using "MODEL 473A", a protein sequencer commercialized by Perkin-Elmer Corp. , No/walk, 

USA, revealed that the polypeptide in Example 4-1 had the amino acid sequence of SEQ ID NO:15 in the N-terminal 
region. 

Judging collectively from this result as well as the information that SDS-polyacrylamide gel electrophresis exhibited 
a main band in a position corresponding to a molecular weight of about 18,000-19,500 daftons, and that the molecular 

so weight calculated from the amino acid sequence of SEQ ID NO:1 was 18,199 daltons, it can be concluded that the 
polypeptide in Example 4-1 has the amino acid sequence of SEQ ID NO:6. 

As is described above, the present invention is made based on the identification of a genomic DNA encoding the 
polypeptide which induces the production of IFN-^by immunocompetent cells. The present genomic DNA efficiently 
express the present polypeptide when introduced into mammalian host cells. The polypeptide features higher biological 

55 activities than that obtained by the cDNA expression in Escherichia coil Therefore, the present genomic DNA is useful 
for the recombinant DNA techniques to prepare the polypeptide capable of inducing I FN^y production by immunocom- 
petent cells. The present genomic DNA is useful to gene therapy for diseases including viral diseases, bacterial-infec- 
tious diseases, malignant tumors and immunopathies. 
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Thus, the present invention is a significant invention which has a remarkable effect and gives a great contribution 
to this field. 

While there has been described what is at present considered to be the preferred embodiments of the present 
invention, it will be understood the various modifications may be made therein, and it is intended to cover in the ap- 
pended claims all such modifications as fall within the true spirits and scope of the invention. 



10 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION: 

5 

(i) APPLICANT: 

NAME : KABUSHIKI KAISHA HAY ASK I BARA SEIBUTSU KAGAKU 
KENKYUJO 

(ii) TITLE OF INVENTION : GENOMIC DNA ENCODING A POLYPEPTIDE 
10 CAPABLE OF INDUCING THE PRODUCTION OF INTERFERON-? 

(iii) NUMBER OF SEQUENCES : 15 



75 



(iv) ADDRESS: 

(A) ADDRESSEE: KABUSHIKI KAISHA HA Y AS H I BARA SEIBUTSU KAGAKU 
KENKYUJO 

(B) STREET:2-3, 1-CHOME, SHIMOISHII 

(C) CITY: OKA YAMA 

(E) COUNTRY : JAPAN 

(F) POSTAL CODE (ZIP): 700 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE : Floppy disk 

(3) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM : PC-DOS /MS-DOS 

25 (vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER : JP 185,305/96 

(B) FILING DATE: June 27, 1996 



20 



INFORMATION FOR SEQ ID NO : 1 : 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 157 amino acids 

(B) TYPE .-amino acid 
(D) TOPOLOGY: linear 

( ii > MOLECULE TYPE : peptide 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : 1 : 



45 



Tyr 


Phe 


Gly 


Lys 


Leu 


Glu 


Ser 


Lys 


Leu 


Ser Val 


He Arg 


Asn 


Leu 


Asn 


1 








5 










10 




15 




Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 


Gly 


Asn Arg 


Pro Leu 


Phe 


Glu 


Asp 








20 










25 




30 




Met 


Thr 


Asp 


Ser 


Asp 


Cys 


Arg 


Asp 


Asn 


Ala Pro 


Arg Thr 


lie 


Phe 


lie 






35 










40 






45 








He 


Ser 


Met 


Tyr 


Lys Asp 


Ser 


Gin 


Pro 


Arg Gly 


Met Ala 


val 


Thr 


He 




50 










55 






60 








Ser 


val 


Lys 


Cys 


Glu 


Lys 


He 


Ser 


Xaa 


Leu Ser 


Cys Glu 


Asn 


Lys 


lie 


65 










70 








75 




80 


He 


Ser 


Phe 


Lys 


Glu 


Met 


Asn 


Pro 


Pro 


Asp Asn 


He Lys 


Asp 


Thr 


Lys 










85 










90 




95 


Ser 


Asp 


He 


He 


Phe 


Phe 


Gin 


Arg 


Ser 


Val Pro 


Gly His 


Asp 


Asn 


Lys 








100 










105 






110 




Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


Tyr 


Glu 


Gly Tyr 


Phe Leu 


Ala 


Cys 


Glu 






115 










12 0 






125 






Lys 


Glu Arg Asp 


Leu 


Phe 


Lys 


Leu 


He 


Leu Lys 


Lys Glu 


Asp 


Glu 


Leu 




130 










135 






140 






Gly Asp Arg 


Ser 


He 


Met 


Phe 


Thr 


Val 


Gin Asn 


Glu Asp 








145 










150 








155 









(2) 

30 
35 
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(3) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1120 base pairs 
s (B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : cDNA to mRNA 

(iii) HYPOTHETICAL: No 

(iv) ANTI -SENSE: No 

10 (vi) ORIGINAL SOURCE: 

( A ) ORGAN ISM : human 
(F) TISSUE TYPE: liver 
(IX) FEATURE: 

(A) NAME/KEY: 5' UTR 

(B) LOCATION: 1. .177 

^ (C) IDENTIFICATION METHODS : E 

(A) NAME/KEY: leader peptide 

(B) LOCATION: 178. .285 

(C) IDENTIFICATION METHODS : S 
(A) NAME /KEY : mat peptide 

20 (B) LOCATION: 286 . .756 

(C) IDENTIFICATION METHODS : S 

(A) NAME /KEY : 3 ' UTR 

(B) LOCATION: 757. .1120 

(C) IDENTIFICATION METHODS : E 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 

25 

GCCTGGACAG TCAGCAAGGA ATTGTCTCCC AGTGCATTTT GCCCTCCTGG CTGCCAACTC SO 
TGGCTGCTAA AGCGGCTGCC ACCTGCTGCA GTCTACACAG CTTCGGGAAG AGGAAAGGAA 12 0 
CCTCAGACCT TCCAGATCGC TTCCTCTCGC AACAAACTAT TTGTCGCAGG AATAAAG 177 



30 



35 



45 



50 



55 



ATG 


GCT 


GCT 


GAA 


CCA 


GTA 


GAA 


GAC 


AAT 


TGC 


ATC 


AAC 


TTT 


GTG 


GCA 


ATG 


225 


Mec 


Ala 


Ala 


GlU 


Pro 


Val 


Glu 


Asp 


Asn 


Cys 


He 


Asn 


Phe 


Val 


Ala 


Met 






-35 










-30 






-25 












AAA 


TTT 


ATT 


GAC 


AAT 


ACG 


CTT 


TAC 


TTT 


ATA 


GCT 


GAA 


GAT 


GAT 


GAA 


AAC 


273 


Lys 


Phe 


He 


Asp 


Asn 


Thr 


Leu 


Tyr 


Phe 


He 


Ala 


Glu 


Asp 


Asp 


Glu 


Asn 




-20 










-15 










-10 










-5 




CTG 


GAA 


TCA 


GAT 


TAC 


TTT 


GGC 


AAG 


CTT 


GAA 


TCT 


AAA 


TTA 


TCA 


GTC 


ATA 


321 


Leu 


Glu 


Ser 


Asp Tyr 
i 


Phe 


Gly 


Lys 


Leu 


Glu 


Ser 


Lys 


Leu 


Ser 


Val 


He 




AGA 


AAT 


TTG 


AAT 


GAC 


CAA 


GTT 


CTC 


5 

TTC 


ATT 


GAC 


CAA 


GGA 


10 
AAT 


CGG 


CCT 


369 


Arg 


Asn 


Leu 


Asn Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 


Gly 


Asn Arg 


Pro 








15 










20 








25 










CTA 


TTT 


GAA 


GAT 


ATG 


ACT 


GAT 


TCT 


GAC 


TGT 


AGA 


GAT 


AAT 


GCA 


CCC 


CGG 


417 


Leu 


Phe 


Glu 


Asp 


Met 


Thr 


Asp 


Ser 


Asp 


Cys 


Arg 


Asp 


Asn 


Ala 


Pro 


Arg 






30 










35 










40 










ACC 


ATA 


TTT 


ATT 


ATA 


AGT 


ATG 


TAT 


AAA 


GAT 


AGC 


CAG 


CCT 


AGA 


GGT 


ATG 


465 


Thr 


He 


Phe 


He 


He 


Ser 


Met 


Tyr 


Lys 


Asp 


Ser 


Gin 


Pro 


Arg Gly 


Met 




45 










50 










55 










60 




GCT 


GTA 


ACT 


ATC 


TCT 


GTG 


AAG 


TGT 


GAG 


AAA 


ATT 


TCA 


AYT 


CTC 


TCC 


TGT 


513 


Ala 


Val 


Thr 


He 


Ser 


Val 


Lys 


Cys 


Glu 


Lys 


He 


Ser 


Xaa 


Leu 


Ser 


Cys 












65 










70 










75 




GAG 


AAC 


AAA 


ATT 


ATT 


TCC 


TTT 


AAG 


GAA 


ATG 


AAT 


CCT 


CCT 


GAT 


AAC 


ATC 


561 


Glu 


Asn 


Lys 


He 


He 


Ser 


Phe 


Lys 


Glu 


Met 


Asn 


Pro 


Pro 


Asp 


Asn 


He 










80 










85 










90 








AAG 


GAT 


ACA 


AAA 


AGT 


GAC 


ATC 


ATA 


TTC 


TTT 


CAG 


AGA 


AGT 


GTC 


CCA 


GGA 


609 


Lys 


Asp 


Thr 


Lys 


Ser 


Asp 


He 


He 


Phe 


Phe 


Gin 


Arg 


Ser 


Val 


Pro 


Gly 








95 










100 








105 








CAT 


GAT 


AAT 


AAG 


ATG 


CAA 


TTT 


GAA 


TCT 


TCA 


TCA 


TAC 


GAA 


GGA 


TAC 


TTT 


657 


His 


Asp 


Asn 


Lys 


Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


Tyr 


Glu 


Gly Tyr 


Phe 






110 










115 










120 












CTA 


GCT 


TGT 


GAA 


AAA 


GAG 


AGA 


GAC 


CTT 


TTT 


AAA 


CTC 


ATT 


TTG 


AAA 


AAA 


705 
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Leu Ala Cys Glu Lys Glu Arg Asp Leu Phe Lys Leu He Leu Lys Lys 

125 130 135 140 

GAG GAT GAA TTG GGG GAT AGA TCT ATA ATG TTC ACT GTT CAA AAC GAA 7 53 

Glu Asp Glu Leu Gly Asp Arg Ser He Met Phe Thr Val Gin Asn Glu 

145 150 155 

GAC TAGCTATTAA AATTTCATGC CGGGCGCAGT GGCTCACGCC TGTAATCCCA 8 06 

Asp 

GCCCTTTGGG AGGCTGAGGC GGGCAGATCA CCAGAGGTCA GGTGTTCAAG ACCAGCCTGA 8 66 
CCAACATGGT GAAACCTCAT CTCTACTAAA AATACTAAAA ATTAGCTGAG TGTAGTGACG 9 26 
CATGCCCTCA ATCCCAGCTA CTCAAGAGGC TGAGGCAGGA GAATCACTTG CACTCCGGAG 9 96 
GTAGAGGTTG TGGTGAGCCG AGATTGCACC ATTGCGCTCT AGCCTGGGCA ACAACAGCAA 104 6 
AACTCCATCT CAAAAAATAA AATAAATAAA TAAACAAATA AAAAATTCAT AATGTGAAAA 1106 
AAAAAAAAAA AAAA 1120 



(4) INFORMATION FOR SEQ ID NO:3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 5 base pairs 

(B) TYPE : nucleic acid 

(C) STRANDEDNESS: double 

( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 
(F) TISSUE TYPE.-placenta 
(iX) FEATURE: 

(A) NAME/ KEY : exon 

(B) LOCATION:l. .135 

(C) IDENTIFICATION METHODS : S 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



AA 


AAC 


CTG 


GAA 


TCA 


GAT 


TAC 


TTT 


GGC 


AAG 


CTT 


GAA 


TCT 


AAA 


TTA 


TCA 


Glu 


Asn 

5 


Leu 


Glu 


Ser 


Asp 


Tyr 
i 


Phe 


Gly Lys 


Leu 


Glu 


Ser 


Lys 


Leu 


Ser 

1 0 


GTC 


ATA 


AGA 


AAT 


TTG 


AAT 


GAC 


CAA 


GTT 


CTC 


TTC 


ATT 


GAC 


CAA 


GGA 


AAT 


Val 


He Arg Ash 


Leu 


Asn 


Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 


Gly Asn 










15 










20 










25 




CGG 


CCT 


CTA 


TTT 


GAA 


GAT 


ATG 


ACT 


GAT 


TCT 


GAC 


TGT 


AGA 


G 






Arg 


Pro 


Leu 


Phe 


Glu 


Asp 


Met 


Thr 


Asp 


Ser Asp 


Cys 


Arg 


Asp 







30 35 40 



(5) INFORMATION FOR SEQ ID NO: 4: 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 134 base pairs 

(B) TYPE: nucleic acid 

( C ) STRANDEDNESS : doubl G 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM : human 
(F) TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME/KEY: exon 

(B) LOCATION: 1. .134 

(C) IDENTIFICATION METHODS: S 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

AT AAT GCA CCC CGG ACC ATA TTT ATT ATA AGT ATG TAT AAA GAT AGC 47 
Asp Asn Ala Pro Arg Thr He Phe He He Ser Met Tyr Lys Asp Ser 
40 45 50 55 
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CAG CCT AGA GGT 
Gin Pro Arg Gly 

TCA ACT CTC TCC 
Ser Thr Leu Ser 
80 



ATG GCT GTA ACT 
Met Ala val Thr 
60 

TGT GAG AAC AAA 
Cys Glu Asn Lys 



ATC TCT GTG AAG 
He Ser Val Lys 
65 

ATT ATT TCC TTT 
He He Ser Phe 
85 



TGT GAG AAA ATT 
Cys Glu Lys He 
70 

AAG 
Lys 



(6) INFORMATION FOR SEQ ID NO: 5: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 87 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii> MOLECULE TYPE : Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 

{ F) TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME/KEY :exon 

(B) LOCATION: 1. .87 

(C) IDENTIFICATION METHODS : S 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

GAATAAAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC AAC TTT GTG 
Met Ala Ala Glu Pro Val Glu Asp Asn Cys He Asn Phe Val 
-35 -30 ~ -25 

GCA ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA G 
Ala Met Lys Phe He Asp Asn Thr Leu Tyr Phe He Ala 
-20 -15 * -10 



(7) INFORMATION FOR SEQ ID NO : 6 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 12 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 
(F) TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME/KEY :exon 

(B) LOCATION: 1. .87 

(C) IDENTIFICATION METHODS : S 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

CT GAA GAT GAT G 
Ala Glu Asp Asd Glu 
-10 



(8) INFORMATION FOR SEQ ID NO : 7 : 

( i ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2167 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE .-Genomic DNA 
(vi) ORIGINAL SOURCE: 
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10 



20 



(A) ORGANISM: human 
(F) TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME / KEY : exon + 3 ' UTR 

(B) LOCATION : 1 . .2167 

(C) IDENTIFICATION METHODS : E 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

GAA ATG AAT CCT CCT GAT AAC ATC AAG GAT ACA AAA AGT GAC ATC ATA 4 3 

Glu Met Asn Pro Pro Asp Asn He Lys Asp Thr Lys Ser Asp He He 
85 90 95 100 

TTC TTT CAG AGA AGT GTC CCA GGA CAT GAT AAT AAG ATG CAA TTT GAA 96 
Phe Phe Gin Arg Ser Val Pro Gly His Asp Asn Lys Met Gin Phe Glu 
105 110 ^ 115 

15 TCT TCA TCA TAG GAA GGA TAC TTT CTA GCT TGT GAA AAA GAG AGA GAC 14 4 

Ser Ser Ser Tyr Glu Gly Tyr Phe Leu Ala Cys Glu Lys Glu Arg Asp 

120 125 130 

CTT TTT AAA CTC ATT TTG AAA AAA GAG GAT GAA TTG GGG GAT AGA TCT 192 
Leu Phe Lys Leu He Leu Lys Lys Glu Asp Glu Leu Gly Asp Arg Ser 

13 5 14 0 14 5 

ATA ATG TTC ACT GTT CAA AAC GAA GAC TAGCTAT TAAAATTTCA TGCCGGGCGC 246 
He Met Phe Thr Val Gin Asn Glu Asp 

150 155 
AGTGGCTCAC GCCTGTAATC CCAGCCCTTT GGGAGGCTGA GGCGGGCAGA TCACCAGAGG 30 6 
TCAGGTGTTC AAGACCAGCC TGACCAACAT GGTGAAACCT CATCTCTACT AAAAATACAA 36 6 
2S AAAATTAGCT GAGTGTAGTG ACCCATGCCC TCAATCCCAG CTACTCAAGA GGCTGAGGCA 426 

GGAGAATCAC TTGCACTCCG GAGGTGGAGG TTGTGGTGAG CCGAGATTGC ACCATTGCGC 486 
TCTAGCCTGG GCAACAACAG CAAAACTCCA TCTCAAAAAA TAAAATAAAT AAATAAACAA 546 
ATAAAAAATT CATAATGTGA ACTGTCTGAA TTTTTATGTT TAGAAAGATT ATGAGATTAT 606 
TAGTCTATAA TTGTAATGGT GAAATAAAAT AAATACCAGT CTTGAAAAAC ATC ATT AAG A 6 66 
AATGAATGAA CTTTCACAAA AGCAAACAAA CAGACTTTCC CTTATTTAAG TGAATAAAAT 726 
30 AAAATAAAAT AAAATAATGT TTAAAAAATT CATAGTTTGA AAACATTCTA CATTGTTAAT 7 96 

TGGCATATTA ATTATACTTA ATATAATTAT TTTTAAATCT TTTGGGTTAT TAGTCCTAAT 84 6 
GACAAAAGAT ATTGATATTT GAACTTTCTA ATTTTTAAGA ATATCGTTAA ACCATCAATA 9 06 
TTTTTATAAG GAGGCCACTT CACTTGACAA ATTTCTGAAT TTCCTCCAAA GTCAGTATAT .9 66 
TTTTAAAATT CAGTTTGATC CTGAATCCAG CAATATATAA AAGGGATTAT ATACTCTGGC 10 2 6 
CAACTGACAT TCATCCTAGG AATGCAAAGA TGGTTTAATA TCCTAAAATC AAT T AAC AT A 10 8 6 
35 ACATACTATA TTAATAAAGT ATCAAAACAG TATTCTCATC TTTTTTTCTT TTTTCACAAT 114 6 
TCCTTGGTTA CACTATCATC TC AAT AG ATG CAGAAAAAGC ATTTGACAAA ATCCAATTCA 12 0 6 
TAATAAAAAT TCTCAAACTT GAAAGAGAAC ATCATAAAGG CATCTATGAA AAACCTACAG 12 6 6 
CT AAT AT CAT ACTTAACGAT GAAAAACTGA ATTATTTTAC CCTAAGATCA AGAATAATGC 13 26 
AAGCATGTCA GCTCTTGCAA CTTCTATTCA ACATTGTACT GGAGGTTCTA GCCAGAGCAA 13 86 
C CAT AC AAT A AATAAAAATA AAAGGCACCC AGATTAGAAA GGAAGTCTTT ATTTGCAGAC 1446 
AACATGGTTC TTTATGCAGA AAACCGTCAG GAATACACAC ACATGTTAGA ACTAATAAGT 15 06 
TCAGCAAGGT TGCAGGTTGC AATATCAATA TGCAAAAATA CATTGAAGGC TGGGCTCAGT 1566 
GGAGATGGCA TGTACCTTTC GTC CC AG CTA CTTGGGAGGC TGAGGTAGGA GGATCACTTG 16 26 
AGGTGAGGAG TTTGAGGCTA TAGTGCAATG TGATCTTGCC TGTGAATAGC CACTGCACTC 16 8 6 
GAGCCTAGGC AACAAAGTGA GACCCCGTCT CCAAAAAAAA AAATGGTATA TTGGTATTTC 174 6 
4S TGTATATGAA CAATGAATGA TCTGAAAACA AGAAAATTCC ATTCACGATG GTATTAAAAA 180 6 
AATAAAATAC AAATAAATTT AGCAAAATAA TTATAAAACT TGTACATCGA AAATTTCAAA 18 6 6 
GCACTCTGAG GGAAATTAAA GATGATCTAA ATAATTGGAG AGACACTCTA TGATCACTGA 1926 
TTGGAAAATT CATTCAATAT TGTTAAGATA ACAATTGTCC CCAAATTGAT GCATGCATTC 1986 
AATTTAGTCT TCATCAAAAT TCCAGCAGGG TTTTTGCAGA AATTGACAAG CTGTACCCAA 2046 
AATGTATATG GAAATGAAAA GACCCAGAAG AGCAAATAAT TTTTTAAAAA CAAAGTTGGA 2106 
SO AAACTTTTAC TTCCTAATTT TAAAACTTAC TATAAACCTA AAGTT AT CAA GACCATTTAG 2166 
T 2167 

(9) INFORMATION FOR SEQ ID NO : 8 : 
( i ) SEQUENCE CHARACTERISTICS : 
55 (A) LENGTH: 1334 base pairs 



40 
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(B) TY?E:nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 

(F)TISSUE TYPE: placenta 
(iX) FEATURE : 

(A) NAME/KEY : intron 

(B) LOCATION: 1 . .1334 

(C) IDENTIFICATION METHODS : E 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 

GTATTTTTTT TAATTCGCAA ACATAGAAAT GACTAGCTAC TTCTTCCCAT TCTGTTTTAC 60 
TGCTTACATT GTTCCGTGCT AGTCCCAATC CTCAGATGAA AAGTCACAGG AGTGACAATA 120 
ATTTCACTTA CAGGAAACTT TATAAGGCAT CCACGTTTTT TAGTTGGGGT AAAAAATTGG 18 0 
ATACAATAAG ACATTGCTAG GGGTCATGCC TCTCTGAGCC TGCCTTTGAA TCACCAATCC 24 0 
CTTTATTGTG ATTGCATTAA CTGTTTAAAA CCTCTATAGT TGGATGCTTA ATCCCTGCTT 300 
GTTACAGCTG AAAATG CTG A TAGTTTACCA GGTGTGGTGG CATCTATCTG TAATCCTAGC 3 60 
TACTTGGGAG GCTCAAGCAG GAGGATTGCT TGAGGCCAGG ACTTTGAGGC TGTAGTACAC 420 
TGTGATCGTA CCTGTGAATA GCCACTGCAC TCCAGCCTGG GTGATATACA GACCTTGTCT 480 
CTAAAATTAA AAAAAAAAAA AAAAAAAACC TTAGGAAAGG AAATTGATCA AGTCTACTGT 54 0 
GCCTTCCAAA ACATGAATTC CAAATATCAA AGTTAGGCTG AGTTGAAGCA GTGAATGTGC 6 00 
ATTCTTTAAA AATACTGAAT ACTTACCTTA ACATATATTT TAAATATTTT ATTTAGCATT 6 60 
TAAAAGTTAA AAACAATCTT TTAGAATTCA TATCTTTAAA ATACTCAAAA AAGTTGCAGC 72 0 
GTGTGTGTTG TAATACACAT TAAACTGTGG GGTTGTTTGT TTGTTTGAGA TGCAGTTTCA 78 0 
CTCTGTCACC CAGGCTGAAG TGCAGTGCAG TGCAGTGGTG TGATCTCGGC TCACTACAAC 84 0 
CTCCACCTCC CACGTTCAAG CGATTCTCAT GCCTCAGTCT CCCGAGTAGG TGGGATTACA 9 00 
GGCATGCACC ACTTACACCC GGCTAATTTT TGTATTTTTA GTAGAGCTGG GGTTTCACCA 960 
TGTTGGCCAG GCTGGTCTCA AACCCCTAAC CTCAAGTGAT CTGCCTGCCT CAGCCTCCCA 10 2 0 
AACAAACAAA CAACCCCACA GTTTAATATG TGTTACAACA CACATGCTGC AACTTTTATG 1080 
AGTATTTTAA TGATATAGAT TATAAAAGGT TGTTTTTAAC TTTTAAATGC TGGGATTACA 1140 
GGCATGAGCC ACTGTGCCAG GCCTGAACTG TGTTTTTAAA AATGTCTGAC CAGCTGTACA 12 0 0 
TAGTCTCCTG CAGACTGGCC AAGTCTCAAA GTGGGAACAG GTGTATTAAG GACTATCCTT 12 6 0 
TGGTTAAATT TCCGCAAATG TTCCTGTGCA AGAATTCTTC TAACTAGAGT TCTCATTTAT 13 2 0 
TATATTTATT TCAG 13 34 



(10) INFORMATION FOR SEQ ID NO : 9 : 
( i J SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 4 773 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic DNA 
(vi) ORIGINAL SOURCE : 

(A) ORGANISM: human 

(F) TISSUE TYPE: placenta 
(iX) FEATURE: 

(A) NAME/ KEY : intron 

(B) LOCATION: 1. .4773 

(C) IDENTIFICATION METHODS : E 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 9 : 

GTAAGACTGA GCCTTACTTT GTTTTCAATC ATGTTAATAT AATCAATATA ATTAGAAATA 60 

TAACATTATT TCTAATGTTA ATATAAGTAA TGTAATTAGA AAACTCAAAT ATCCTCAGAC 120 

CAACCTTTTG TCTAGAACAG AAATAACAAG AAGCAGAGAA CCATTAAAGT GAATACTTAC 180 

TAAAAATTAT CAAACTCTTT ACCTATTGTG ATAATGATGG TTTTTCTGAG CCTGTCACAG 240 

GGGAAGAGGA GATACAACAC TTGTTTTATG ACCTGCATCT CCTGAACAAT CAGTCTTTAT 3 00 

ACAAATAATA ATGTAGAATA CATATGTGAG TTATACATTT AAGAATAACA TGTGACTTTC 3 60 

CAGAATGAGT TCTGCTATGA AGAATGAAGC TAATTATCCT TCTATATTTC TACACCTTTG 420 
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TAAATTATGA TAATATTTTA ATCCCTAGTT GTTTTGTTGC TGATCCTTAG CCTAAGTCTT 4 80 
AGACACAAGC TTCAGCTTCC AGTTGATGTA TGTTATTTTT AATGTTAATC T AATTGAATA 54 0 
AAAGTTATGA GATCAGCTGT AAAAGTAATG CTATAATTAT CTTCAAGCCA GGTATAAAGT 60 0 
ATTTCTGGCC TCTACTTTTT CTCTATTATT CTCCATTATT ATTCTCTATT ATTTTTCTCT 66 0 
5 ATTTCCTCCA TTATTGTTAG ATAAACCACA ATTAACTATA GCTACAGACT GAGCCAGTAA 72 0 

GAGTAGCCAG GGATGCTTAC AAATTGGCAA TGCTTCAGAG GAGAATTCCA TGTCATGAAG 7 80 
ACTCTTTTTG AGTGGAGATT TGCCAATAAA TATCCGCTTT CATGCCCACC CAGTCCCCAC 84 0 
TGAAAGACAG TTAGGATATG ACCTTAGTGA AGGTACCAAG GGGCAACTTG GTAGGGAGAA 90 0 
AAAAGCCACT CTAAAATATA ATCCAAGTAA GAACAGTGCA TATGCAACAG ATACAGCCCC 9*0 
w CAGACAAATC CCTCAGCTAT CTCCCTCCAA CCAGAGTGCC ACCCCTTCAG GTGACAATTT 102 0 

GGAGTCCCCA TTCTAGACCT GACAGGCAGC TTAGTTATCA AAATAGCATA AGAGGCCTGG 108C 
GATGGAAGGG TAGGGTGGAA AGGGTTAAGC ATGCTGTTAC TGAACAACAT AATTAGAAGG 114 0 
GAAGGAGATG GCCAAGCTCA AGCTATGTGG GATAGAGGAA AACTCAGCTG CAGAGGCAGA 12 00 
TTCAGAAACT GGGATAAGTC CGAACCTACA GGTGGATTCT TGTTGAGGGA GACTGGTGAA 126 0 
AATGTTAAGA AGATGGAAAT AATGCTTGGC ACTTAGTAGG AACTGGGCAA ATCCATATTT 132 0 
15 GGGGGAGCCT GAAGTTTATT CAATTTTGAT GGCCCTTTTA AATAAAAAGA ATGTGGCTGG 13 80 

GCGTGGTGGC TCACACCTGT AATCCCAGCA CTTTGGGAGG CCGAGGGGGG CGGATCACCT 144 0 
GAAGTCAGGA GTTCAAGACC AGCCTGACCA ACATGGAGAA ACCCCATCTC TACTAAAAAT 150 0 
ACAAAATTAG CTGGGCGTGG TGGCATATGC CTGTAATCCC AGCTACTCGG GAGGCTGAGG 156 0 
CAGGAGAATC TTTTGAACCC GGGAGGCAGA GGTTGCGATG AGCCTAGATC GTGCCATTGC 16 2 0 
ACTCCAGCCT GGGCAACAAG AGCAAAACTC GGTCTCAAAA AAAAAAAAAA AAAAGTGAAA 1680 
20 TTAACCAAAG GCATTAGCTT AATAATTTAA TACTGTTTTT AAGTAGGGCG GGGGGTGGCT 174 0 

GGAAGAGATC TGTGTAAATG AGGGAATCTG ACATTTAAGC TTCATCAGCA T CAT AG C AAA 18 00 
TCTGCTTCTG GAAGGAACTC AATAAATATT AGTTGGAGGG GGGGAGAGAG TGAGGGGTGG 18 60 
ACTAGGACCA GTTTTAGCCC TTGTCTTTAA TCCCTTTTCC TGCCACTAAT AAGGATCTTA 192 0 
GCAGTGGTTA TAAAAGTGGC CTAGGTTCTA GATAATAAGA TACAACAGGC CAGGCACAGT 198 0 
GGCTCATGCC TATAATCCCA GCACTTTGGG AGGGCAAGGC GAGTGTCTCA CTTGAGATCA 204 0 
25 GGAGTTCAAG ACCAGCCTGG CCAGCATGGC GATACTCTGT CTCTACTAAA AAAAATACAA 2100 

AAATTAGCCA GGCATGGTGG CATGCACCTG TAATCCCAGC TACTCGTGAG CCTGAGGCAG 2160 
AAGAATCGCT TGAAACCAGG AGGTGTAGGC TGCAGTGAGC TGAGATCGCA CCACTGCACT 22 2 0 
CCAGCCTGGG CGACAGAATG AGACTTTGTC TCAAAAAAAG AAAAAGATAC AACAGGCTAC 22 8 0 
CCTTATGTGC TCACCTTTCA CTGTTGATTA CTAGCTATAA AGTCCTATAA AGTTCTTTGG 234 0 
TCAAGAACCT TGACAACACT AAGAGGGATT TGCTTTGAGA GGTTACTGTC AGAGTCTGTT 2400 
TCATATATAT ACATATACAT GTATATATGT ATCTATATCC AGGCTTGGCC AGGGTTCCCT 24 6 0 
CAGACTTTCC AGTGCACTTG GGAGATGTTA GGTCAATATC AACTTTCCCT GGATTCAGAT 2 52 0 
TCAACCCCTT CTGATGTAAA AAAAAAAAAA AAAAAGAAAG AAATCCCTTT CCCCTTGGAG 2580 
CACTCAAGTT TCACCAGGTG GGGCTTTCCA AGTTGGGGGT TCTCCAAGGT CATTGGGATT 2640 
GCTTTCACAT CCATTTGCTA TGTACCTTCC CTATGATGGC TGGGAGTGGT C AAC AT C AAA 2700 
3$ ACTAGGAAAG CTACTGCCCA AGGATGTCCT TACCTCTATT CTGAAATGTG CAATAAGTGT 2760 

GATTAAAGAG ATTGCCTGTT CTACCTATCC ACACTCTCGC TTTCAACTGT AACTTTCTTT 28 2 0 
TTTTCTTTTT TTCTTTTTTT CTTTTTTTTT GAAACGGAGT CTCGCTCTGT CGCCCAGGCT 28 8 0 
AGAGTGCAGT GGCACGATCT CAGCTCACTG CAAGCTCTGC CTCCCGGGTT CACGCCATTC 2 94 0 
TCCTGCCTCA CCCTCCCAAG CAGCTGGGAC TACAGGCGCC TGCCACCATG CCCAGCTAAT 3 0 00 
TTTTTGTATT TTTAGTAGAG ACGGGGTTTC ACCGTGTTAG CCAGGATGGT CTCGATCTCC 3 06 0 
40 TGAACTTGTG ATCCGCCCGC CTCAGCCTCC CAAAGTGCTG GGATTACAGG CGTGAGCCAT 312 0 

CGCACCCGGC TCAACTGTAA CTTTCTATAC TGGTTCATCT TCCCCTGTAA TGTTACTAGA 318 0 
GCTTTTGAAG TTTTGGCTAT GGATTATTTC TCATTTATAC ATTAGATTTC AG ATT AGTT C 3 24 0 
CAAATTGATG CCCACAGCTT AGGGTCTCTT CCTAAATTGT ATATTGTAGA CAGCTGCAGA 33 0 0 
AGTGGGTGCC AATAGGGGAA CTAGTTTATA CTTTCATCAA CTTAGGACCC ACACTTGTTG 3 3 60 
ATAAAGAACA AAGGTCAAGA GTTATGACTA CTGATTCCAC AACTGATTGA GAAGTTGGAG 342 0 
45 ATAACCCCGT GACCTCTGCC ATCCAGAGTC TTTCAGGCAT CTTTGAAGGA TGAAGAAATG 34 8 0 

CTATTTTAAT TTTGGAGGTT TCTCTATCAG TGCTTAGGAT CATGGGAATC TGTGCTGCCA 3 54 0 
TGAGGCCAAA ATTAAGTCCA AAACATCTAC TGGTTCCAGG ATTAACATGG AAGAACCTTA 3 6 00 
GGTGGTGCCC ACATGTTCTG ATCCATCCTG CAAAATAGAC ATGCTGCACT AACAGGAAAA 3 6 60 
GTGCAGGCAG CACTACCAGT TGGATAACCT GCAAGATTAT AGTTTCAAGT AATCTAACCA 3 720 
TTTCTCACAA GGCCCTATTC TGTGACTGAA ACATACAAGA ATCTGCATTT GGCCTTCTAA 3 780 
SO GGCAGGGCCC AGCCAAGGAG ACCATATTCA GGACAGAAAT TCAAGACTAC TATGGAACTG 3 84 0 

GAGTGCTTGG CAGGGAAGAC AGAGTCAAGG ACTGCCAACT GAGCCAATAC AGCAGGCTTA 3 900 
CACAGGAACC CAGGGCCTAG CCCTACAACA ATTATTGGGT CTATTCACTG TAAGTTTTAA 3 960 
TTTCAGGCTC CACTGAAAGA GTAAGCTAAG ATTCCTGGCA CTTTCTGTCT CTCTCACAGT 4 020 
TGGCTCAGAA ATGAGAACTG GTCAGGCCAG GCATGGTGGC TTACACCTGG AATCCCAGCA 4080 
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CTTTGGGAGG CCGAAGTGGG AGGGTCACTT 
ACAAAGTGAG ATACCCCCTG ACCCCTTCTC 
TGTGGTGGTG TATACTTACA GTCCCAGCTA 
AGCCCAGGAA TTCAAGGCTG CAGTGAGCTA 
ACAGAGCGAG ACCCTGTCTC AAAGCAAAAA 
TGGGAGGAGG TCATCATCGT CTTTAGCCGT 
ATTAGCCCAA AAAGCTTGTG GTCTTTGCTG 
CACCACTCAA TGGGAGAGGA GAGAAGTAAG 
CTGGAACTGA ATATGCATCC CATGACAGGG 
GGAGGTCAGT ACTGCTGTTC AGAGATTTTT 
TTTGTTCTGT TTGGTAATAT ACTTCAAAAC 
TGAAATAATT AGGTAATGTT TTTTTCTCTA 



GAGGCCAGGA GTTCAGGACC AGCTTAGGCA 414 0 
TACAAAAATA AATTTTAAAA ATT AG CC AAA 4200 
CTCAGGAGGC TGAGGCAGGG GGATTGCTTG 4 2 60 
TGATTTCACC ACTGCACTTC TGGCTGGGCA 4320 
GAAAAAGAAA CTAGAACTAG CCTAAGTTTG 4380 
GAATGGTTAT TATAGAGGAC AGAAATTGAC 444 0 
GAACTCTACT TAATCTTGAG CAAATGTGGA 4 500 
CTGTTTGATG TATAGGGGAA AACTAGAGGC 4560 
AGAATAGGAG ATTCGGAGTT AAGAAGGAGA 4 62 0 
TTTATGTAAC TCTTGAGAAG CAAAACTACT 4 680 
AAACTTCATA TATTCAAATT GTTCATGTCC 4 740 
TAG 4773 



(11) INFORMATION FOR SEQ ID NO: 10: 
15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8835 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: Genomic DNA 
20 (vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 

(F) TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME/KEY :intron 

(B) LOCATION: 1. .8835 

25 (C) IDENTIFICATION METHODS : E 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GTAAGAAATA TCATTCCTCT TTATTTGGAA AGTCAGCCAT GGCAATTAGA GGTAAATAAG 60 
CTAGAAAGCA ATTGAGAGGA ATATAAACCA TCTAGCATCA CTACGATGAG CAGTCAGTAT 12 0 

3Q CAACATAAGA AATATAAGCA AAGTCAGAGT AGAATTTTTT TCTTTTATCA GATATGGGAG 180 

AGTATCACTT TAGAGGAGAG GTTCTCAAAC TTTTTGCTCT CATGTTCCCT TTACACTAAG 24 0 
CACATCACAT GTTAGCATAA GTAACATTTT TAATTAAAAA TAACTATGTA CTTTTTTAAC 3 00 
AACAAAAAAA AGCATAAAGA GTGACACTTT TTTATTTTTA CAAGTGTTTT AACTGGTTTA 3 60 
ATAGAAGCCA TATAGATCTG CTGGATTCTC ATCTGCTTTG CATTCAGACT ACTGCAATAT 4 20 
TGCACAGAAT GCAGCCTCTG GTAAACTCTG TTGTACACTC ATGAGAGAAT GGGTGAAAAA 480 

35 GACAAATTAC GTCTTAGAAT TATTAGAAAT AG CTTTCA CT TTAGGAACTC CCTGAGAATT 540 

GCTGCTTTAG AGTGGTAAGA TAAATAAGCT TCTCTTTAAA CGGAATCTCA AGACAGAATC 6 00 
AGTTACATTA AAAGCAAACA AAAAATTTGC CCATGGTTAG TCATCTTGTG AAATCTGCCA 660 
CACCTTTGGA CTGGGCTACA ATTGGATAAT ATAGCATTCC CCGAGATAAT TTTCTCTCAC 720 
AATTAAGGAA AGGGCTGAAT AAATATCTCT GTTTGAAGTT GAATAACAAA AATTAGGACC 780 
CCCTAAATTT TAGGGCTCCT GAAATTCGTC TTTTTGCCTA TATTCAGCTA CTTTACGTTC 840 

40 TATTAAATCT TCTTTCAGGC CAGGTGCACT AGCTCATGCC TAGAATCTCA GGCAGGCCTG 900 

AGCCCAGGAA TTTGAGACCA GCCAGGGCAA CACAGTCTCT ACAAAAAAAT AAAAAATTAC 960 
CTGGGTGTGT TGGTGCATGC CTGTAGAACT ACTCAGGATG CTGAGGACTG CTTGAGCCCA 102 0 
GGATAGCCAA ATCTGTGGTG AGTTCAGCCA CTAAACAGAG CGAGACTTTC TCAAAAAAAC 108 0 
AAACAAAAAA ACAAACAAAC TTCCTTCAAA ATAACTTTTT ATCTGCAATG TTTTCCTATT 114 0 
GCCTGTGAGA TTAAATTTAC TCTTTTACCT GATTTCCAAA GCCCTCCATA ATCTAATCCG 1200 

45 ACTTTACCTT GTGTTCACTG CAAAATAGCA GGACTGTTCC ACTACAATCC AAAAAT C AC A 1260 

GGTTGGGTGC AGTGGCTCAC TCCTGTAATC CCAACACTTT GGAAGGCCAA GGCAGGTGGA 1320 
TTGCTTCAGC TCAGGAGTTC AAGACCAGCC TGGGCAACAT GGCAAAAACC CTGTCTCTCC 1380 
AAAACATACA AAAATTAGCC AGATGTGGTA GTATGTGCCT GTAGTCCCAA CTACTCAAAA 1440 
GGCTAAGGCA AGAGGATCAC TTGAGCCCAG GAGGTCAAGG CTACAGTGAG CCATGTTTAC 1500 
TGTGTCACTG CACTCCAGCC TGGGTGATAG AGCAAGACCA TGTCTCAAAA AAAAAAAAAA 1560 

50 GAAAAGAAAA GAAAAAAACA TCGCTCTATT CAGTTCACCC CCACCACAAC ATTGTTTTGA 162 0 

TTATCACATA AATGCTGGTC CATTGCCTTC TCTATCTATT CAAATCTTTA AGCATTCTTT 16 8 0 
GAGATTCAAC TCAATTCTCC TTTTCAAACT AGGCCATTTA AACTACATCA GTTCCATTTT 174 0 
GATTTTCTTG CTTTGAGTCT ACAGACTCAA AAACAAAAAC TTAAAAACTT ATTTTTTAAG 1800 
TTTTCTGCTA CTCTCACTTC TTCAACACTC ACATACACGC ATTCATAATA AGATGGCAGA 1860 
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ATGTTCAAGG ATAAAATGAT TTATAGAACT GAAAAGTTAG GTTTTGATCT TGTTGCTGTC 1920 
AAGATGACTA CCTACCTGAT CTCAGGTAAT TAATTATGTA GCATGCTCCC TCATTTCATC 1980 
CCATACCTAT TCAACAGGAT 7GGAATTCCA CAGCAAGGAT AAACATAATC ATAGTTGCTT 204 0 
5 TTCAAGTTCA AGGCATTTTA ACTTTTAATC TAGTAGTATG TTTGTTGTTG TTGTTGTTGT 2100 

TTGAGATGGA GCCCTGCTGT GTCACCCAGG CTGGAGTGCA GTGGCACGAA CTCGGCTCAC 2160 
TGCAACCTCT GCCTCATGGG TTCAATCAGT TATTCTGCCT CAGTGTCCCA AGTAGCTGGG 222 0 
ACTACAAGGC ACATGCCACC ATGCCTGGCT AATTTTTGTA TTTTTAGTAG AAACAGGGCT 2280 
TCACCATGTT GGCCAGGCTG GTCTCGAACT CCTGACCTCA AGTGATCCAG CCGCCTCGGC 23 4 0 
CTCCCAAAGT GCTGGGATTA CAGGCATAAG CCACCGTGCC CAGCCTAATA GTATGTTTTT 2400 
10 AAACTCTTAG TGGCTTAACA ATGCTGGTTG TATAATAAAT ATGCCATAAA TATTTACTGT 246 0 

CTTAGAATTA TGAAGAAGTG GTTACTAGGC CGTTTGCCAC ATATCAATGG TTCTCTCCTT 2 520 
ACAGCTTTAA TTAGAGTCTA GAATTGCAGG TTGGTAGAGC TGGAACAGAC CTTAAAGATT 25 80 
GACTAGCCAA CTTCCTTGTC CAAATGAGGG AACTGAGACC CTTAAAATTA AGTGACTTGC 264 0 
CCCAGACAAA ACTGGAACTC ATGTGTCCTA ATTTCCATCA TGAAATTCTA CCATTCACTA 2 70 0 
GCCTCTGGCT AGTTGTCAAA GTATTGCATA ACTAAATTTT TATGTCTGTT TTAAAGAACA 2 760 
AATTGTCACT GCTTACTCCT GGGAGGGTCT TTCTGAGGTG GTTTATAACT CTTAAAAAAA 282 0 
AAAAAGTCAG TAGTCTGAGA ATTTTAGACG AAATAGTCAA AGCATTTTTA TCCAATGGAT 2 88 0 
CTATAATTTT CATAGATTAG AGTTAAATCA AAGAAACACG GATGAGAAAG GAAGAGGAAA 2 94 0 
AT TG AG GAGA GGAGGAATGG GGATGAGAAC ACACTACTTG TAATCAGTCA TAGATGTACT 3 0 00 
GAGAACTAAC AAGAAGAATT GTAAGAAAAT AAGAATGAAG AATTCAAAAT CAACACATGA 3 06 0 
2Q AATAAAAAGA AACTACTAGG GAAAAATGGA GAAGACATTA GAAAAATTAT TCTATTTTTA 312 0 

AAATTCTGTT TTCAGGCTTC CCTCCTGTTC TTCCTCCTTC TCATTGGTTT TCAGGTGGAG 318 0 
GGAAAGTTTA AGATGGAAAA AATATATATA TTCTACACAT CCCTTTCTAC GCTGTTGTCA 3 24 0 
TGGCAACAAG GTTTATCATA GCAAACTTTT ATTCATACAA CATTTATTGA GTTCTTACTG 33 00 
TGTGGTAAGC TCTTTCCAGG TGTTGAAAAT TCAGGGGAAA AAAGACAACT CATTGTCTTA 33 60 
AAACTCAGAT GAAAGCTGAA CAGACCTATT TTTAATCAAA GTAATCTCAA TTTAGGGTAG 3 42 0 
25 TAAGAGC7AT TTAAGAAGCA TGAACAGGTG TGAAGGAGGT AGGACTCTGA GGAGAGAATA 34 80 

GTTAGCTAGG AATGAAAGAG CAGAGAAGTT TTCCTAGAGG AACTATTAAA GCTGGGAGTT 3 54 0 
ACGGGATGAA AGATGAGGCA GGGTTTGCAG GCAAAAAAAA AAAAAAGGCA GGGGAAGGGG 3 60 0 
AAGTT CTGGC CTGGCAGAGA GAATAACTGT GGCAACAATG GAGGAGAGTC TGGAAGCAAG 3 66 0 
AAAACCAAGT AGAAGAGTAT TAAAATAGAA GATGCCAGGG GTAATGAGGG CTTGATTTAA 3 72 0 
AACAGTGCTG TTGGAGATGG AGAGGAGATA CCAAATTCTG GAGACATTTC TGAGTTAGAA 3 78 0 
30 CCTACAGTA7 TTATCAGACA AGGGAAAGAT TAGACAAAGG AGTTAAGAAT GACTCCCAGG 3 34 0 

TTTCAGTTTG GGGCAGGTAA CTAGGACATG TTTTGAAAAG TAATGTATTG GATCTCTTAC 3 90 0 
CATTGGAACT ATGTATGTGG AGCCAAATTA AAATTTGTAC ATGTATATAA CTCTCCCCCC 3 9 60 
ACCACCAC7A ACTACTTCCC TAACTCTCTA CTTTGTAGCC AGACTTCCTA AAAGAATAGT 4 02 0 
TTGTAGTCAC TGTCTTTACT TTTCCCCTCC CATTCTGTCC TAGATATTTG TCCACCTACC 408 0 
ATCTGCTGCC TCCACTTTAC CCAAACTGTT CTACGGTTGC CCAAAACTTC CTAATTGCCA 4140 
35 AATTCAATGA ACAAGTTTAA GCTTATATGT AAATTAGGAG CTCTACAGTT TGATTTCGAG 4200 

CAGCCCCTCC TGAAACCCTT TCTCTTTCGA CTTCTGTGAC ACATCTCAGA TTTACAAAAC 4260 
TGAACTAATT ATTTTACACT TGAGCTGTAT TTTCGTTCTT CTTTCTTGAT GAATGAGGTA 4320 
ACCACTCAAC AAATTGCCCA AG CC AAAAAC TACGAAGTCA TCCTCAGTTC CTCCTTCTTC 438 0 
TGTTTGACCC ACAACAGATC AGCTGAGAAA TCCCGCTGTT TAGTATCTCT TGAATTCATT 444 0 
ACCTTAATTT ATAGCCTCAT CAACTCTTAA TTGTTAAAAT TACTTCAGTA GTTGTTGTCT 4 500 
40 GACCTCTGTC CAATCTTGTT CAATCAGGTC CATTCTTTTG TTCTTGGTGG TGGTGGTGGT 456 0 

GTTGACAGAG TTTCGCTTTT GCTGCCCAGG CTGAAGTGCA GTGGAGCACT TCACTGCAAC 4620 
CACAGCCTCC TGGGTTTAAG CAGTTCACCC TCCCGAGTAG CTGGGACTAC AGGTATGTGC 46 80 
CACCACACCC AGCTAATTTT GTGTTTTCAG TAGAGACAGG GTTTCACCAT GTTGGTCAGG 4740 
CTGGTCTCAA ACTCCTGACC TCAAG CAATC CACCCACCTC AGCCTCCCAA AGTGCTGGGA 48 00 
4S TTACAGGCAT GAGCCACTGC ACACGGACCA GATCCATTGT TTATGTTGCT TCTAGAGTGA 486 0 

GTTTTTAAAA CACAAATTTG ACCATATCTT TCTCCAATTT AAGTCAGTAT TTTTTTTTTC 4 92 0 
AGGAAAAAAC AGTTCAAACT CTTTAGTCTG CTTACACAAG GCCTTTGTAG TCTGACTCTT 4 980 
CTTTCCAAGC TTTCATCAAA GTATACTGCA AGTTACATTT TATGTGAATT GAATTAGGCA 504 0 
ACGGTATAAA AATTATAGTT TATATGGGCA AAATGGAAAT AATGTTAACT CTTCCAAATA 5100 
GTTTATCTAG AATGACATAA TTTCAAAGCT GTCAGGTCAA ATGAGTTATA AACTGTTAAC 516 0 
so ACTATTGCCA CATGCAAGTG TCTCTTATAC TTGGTAGAAT TATCTGCTTC CATGTCATTA 522 0 

TTATGTAAAT TAGACTTTAA ATAACTCAGA AGTTCTTCAG ACATACAGGT TATTATTGTG 5280 
CTTTTTAAAC ATAATTTTAA ATAATTTTAT ATATGATAAT GTTATCCAAG TGCTAAGGGA 534 0 
TGTATTGTTA CTGCTGTGCA AAAAAAAAAA AAAAAAAAAC TCCAAATAAA TATGTTGAAA 5400 
CCAAGTTTAT ATG CAAGAAA ACAATATTAA AAAGGCCAAA GTACCACCAT AATAGGCTGT 54 60 
GTGGAGACGG CAGGCTACAA AACACTAGTA ATAATGCTGA GAAAGTTGAA AAAAGAAAGA 5520 
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AAGCAACAAT ATGCTTTGGT TGTTGTAGGT TTATGTACTC CAAGAATATC TCCTCTCAAA 553 0 
CTTTTACGTT TTTTCCAAAG AAAAGTTAAC TTTGGCTGGG CGCAGTGGCT CTTGCCTGTA 564 0 
GTCCCAGCCT TTGGGAGGCC AAGGCGGGCA GATCACCTGA GGTCAGGAGT TTGAGACCAG 5700 
CCTGACCAAA AATGGAGAAA CCCGCCCCCC TCACTACTAA AAGAATACAA AATTAGGCCG 57 60 
5 GGCACAGTGG CTTACCCCTG TGATCCCAGC ACTTTGGGAG GCCGAAGCAG GAAGATCACC 58 2 0 

TGAGGTCAGG AGTTCGAGAC CAGCCATGGA GAAACCCGTC TCTACTAAAA ATACAAAATT 588 0 
AGCCGGGCGT GGTGGTG CAT GACTGTAATC CCAGCTACTC AGGAGGCTAA GGCAGAGAAT 5 94 0 
CACTTGAACC CAGGCAGTGG AGGTTGCAGT GAGCCGAGAT CGTGCCATTG CACTCCAGCC 600 0 
TGGGCAACAA GAGCGAAACT CTGTATCCAA AAAACAAAAG AAAAGAAAAG GTAACCTTGA 606 0 
1Q ACTATGTGAG ATCTTTAGAA ATGCATTCTT TCTGTAAAAT GTGACTACAT TTGCCTTATT 612 0 

TATGGTAAAA ATGTTGAGGC CTCAAACAAC CCATATTTTC TCGGTCTCCC CGCTGCCTAG 618 0 
CCTTTGTTCA CATTGCTTCT TCTTGGTGGA AGCTCTTCCT CTGGCCTTGA AAATGCCTGC 62 4 0 
TTCTCTTTCA AGGTAGCACA GTCATCACTT TCTGTGGTAA CCTTCTCCAG CACCATCAAA 6 3 00 
CAGAAAGAAT GAATCTCTTG TAAATTCAGC TCTTACGTCA TTCATTACAT TATTTTGTAA 63 6 0 
CTCTTTATAG ATTCTTCTCT CCCACTAGAC TCTGAGTCAC TGGAGAGTAG GAGCCAACTC 6 420 
75 TCATTCATGT GTGGTTTGGT CAGCTACTGG CCACATTCCT GATGCATAGT TAATGCTCAA 6 480 

ACCTTAACTG GTGAATCAGC TCAAATATTG TCCTTCTCTA AATCCATTCA CTCATTGACT 6 54 0 
AACTATGTAC TCAAAATAGT AAACACCAGT AATTTAATCC AATTCCTGCC CATACTGCTT 6 600 
GGTACATTTC AGGTGAATTA GTTTGATAAA TATGTGTGTA TTACATAATA TTAAAGTATG 6 660 
TACAGAAGAT CATGCTAATC ATAATTCACA ACTGATAACT AATCAAACAT AAATGCTCTC 6 72 0 
AGGTTAACAA ATGTCTGCCT TCTCAGTTAA TGCAGTCATT AACAAACACC TTCTGATGCT 6 730 
20 GATAATAGGG CCTTGTTCAG CAATGAAGCC ATAAAGGTGA ATAAAGAACA TGCCCTCGTG 6 84 0 

GAGCTCACAG CCTAGTCATT ATTGTTCTGA TTTTTAATAT TAATGTTGGT TTGGGTTTTG 6 90 0 
GTGAAAAATG TTTAGACTTA TCTTAGTGAT CTTTTCATCC TTTGCTATAT TATTTTTCTC 6 96 0 
TAAGAGTCTT CCTTATCCCC TCCTTTAAAA AACTAGGTGA TAATTCTAAA TTGTAAATTT 7 02 0 
AAATATTATA AATAGCTTAT AAAATTTAAT ATTTATAATA TTTAAATGTT TGATAAATAT 7 0 80 
TTAAATTTTA TAATATTTAA ATGTTTATTT AAATTCATTT GTACATCAGT TTTTATTTTA 7140 
25 TTTAAATGTG TTGGCCAGGC ATGGTGGCTG ACACCTATAA TCCCAGAACT TTGAGAGGCC 7 200 

AAGTCAGGCA AACCATTTGA GCTCAGGAGT TTGAGACCAC CCTGGGCAAC GTGGTGAAAC 7 260 
CCTGTCTCTA CCAAACATAT GAAAACTTAT CTGGGTGTGG TGGCACGCAT CTGTGGTCCC 73 20 
AGATGGGAGT CCCAGGCTAA GATGGGAGAA TCGCTTGAAC CCAGGTGAGA GGGGTGGGGT 73 80 
GGATGTTGCA GTGAGCTGAG ATCGTGCCAC TGCACTCCAA CCTGGGTGAC AGAGTGAGAC 74 4 0 
TCCATCTCAA AAAAAAAAAA TGTTATCTAA ATAAGATAAA TTTAATAACT GTTCGCACTT 7 500 
AGATGAGCAT AAGGAACTAA ACCTAGATAA AACTATCAAA TAAGGCCTGG GTACAGTGAC 7 56 0 
TCATGCCTGT AATCTCAAGC ACTTTGGGAG GCCAAAATTA TACAAAGTTA GTTGTATAAC 762 0 
ACCAACTAAC AACTATTTTG GGGTTAGCTT AATTCAGATT AATTTTTTTT AAACTGAGTT 768 0 
TTAAATTCCT GCTTACTCTA CCATACATGC TAGGCCTCAT ATTATGCTAG AAAAATTTTG 774 0 
AG C AC AG ATT TATGAATACT CTCCTGCATA CCATTTAATT TTTAAACAAA TTTTAATGCA 78 0 0 
3S GTATATATGT GCCTTTTTAC CAACACATTA AATAATAAGA TCTACTGTGA GGACTAAATT 786 0 

TCTGTAATTT CAAAGTAGTA ATGAGTTTAA ACCATGTCTC AAGATCTCTG CAATAACTGT 792 0 
AGCACAACAG AAAATAGGTA TTTCTATTAA TGACAGAGTC ACAAGTACTA CTAATAATAC 798 0 
TGTGGTTTGT TTCCTGCAAC TAATCATGGG AGGAATGCTA AATTTCAGAG GTTGGTGAAA 8 04 0 
ATACATGTGT ATTTTTTTCC CCATCCAAGT TCACAGATTT CTCACACTGA GAACTCCTAT 810 0 
TCCATAACAA AATTCTGGAA GCCTGCACAC CGTATTGGAA GAAGGGCAGA AAGGAAAAGC 8160 
40 AAATGGAAGG ATTTAAATTT TTTTCAAATC CTGTATCCCT TGATTTTACA GCAAGATTGT 8220 

ATTTATGTAT TACTTGTGTT AAAAATATAG TATAATCGAG ACTCCAGATC AAAAAT C AC C 8280 
GCAGCTCAGG GAGAAAGAGG GCCACCAAAT GCCAGAGCCC TTCAGCCTTC TCCCACCCTG 83 40 
CCTGTACCCT CAGATGGAAG CACTTTTTTA TCATTGTTTC ACCTTTAGCA TTTTGACAAT 84 00 
GAAGTCACAA ACCTTCAGCC TCTCACCCAT AGGAACCCAC TGGTTGTAAG AGAAGGATGA 84 60 
AGCCAGTCCT TCCTAAAGGG CACGATTAGA TGTGTTTATG GCATCCTCAG GTGAAACTAT 852 0 
45 ATTTATATTG ACAATATATT TATATTTCTC AAGGAATACT AGAATAATGA TTCAGTTCAG 858 0 

TACTAGGCCA TTTATCTACC CTTTATAATA TTGTTTAATG AG AAAATG C T TTCTATCTTC 864 0 
CAAATATCTG ATGATTTGTA AGAGAACACT TAAACATGGG TATTCATAAG CTGAAACTTC 8700 
TGGCATTTAT TGAATGTCAA GATTGTTCAT CAGTATACTA GGTGATTAAC TGACCACTGA 8760 
ACTTGAAGGT AGTATAAAGT AGTAGTAAAA GGTACAATCA TTGTCTCTTA ACAGATGGCT 8820 
CTTTGCTTTC ATTAG 8 83 5 



(12) INFORMATION FOR SEQ ID NO: 11: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 71 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE : Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 

(F) TISSUE TYPEiplacenca 
(iX) FEATURE: 

(A) NAME/KEY rintron 

(B) LOCATION: 1 . . 1371 

(C) IDENTIFICATION METHODS : E 

(xi ) SEQUENCE DESCRIPTION : SEQ ID NO: 11: 

GTAAGGCTAA TGCCATAGAA CAAATACCAG GTTCAGATAA ATCTATTCAA TTAGAAAAGA 60 
TGTTGTGAGG TGAACTATTA AGTGACTCTT TGTGTCACCA AATTTCACTG TAATATTAAT 120 
GGCTCTTAAA AAAATAGTGG ACCTCTAGAA ATTAACCACA ACATGTCCAA GGTCTCAGCA 180 
CCTTGTCACA CCACGTGTCC TGGCACTTTA ATCAGCAGTA GCTCACTCTC CAGTTGGCAG 24 0 
TAAGTGCACA TCATGAAAAT CCCAGTTTTC ATGGGAAAAT CCCAGTTTTC ATTGGATTTC 3 00 
CATGGGAAAA ATCCCAGTAC AAAACTGGGT GCATTCAGGA AATACAATTT CCCAAAGCAA 3 60 
ATTGGCAAAT TATGTAAGAG ATTCTCTAAA TTTAGAGTTC CGTGAATTAC ACCATTTTAT 4 20 
GTAAATATGT TTGACAAGTA AAAATTGATT CTTTTTTTTT TTTTCTGTTG CCCAGGCTGG 4 80 
AGTGCAGTGG CACAATCTCT GCTCACTGCA ACCTCCACCT CCTGGGTTCA AGCAATTCTC 54 0 
CTGCCTCAGC CTTCTGAGTA GCTGGGACTA CAGGTGCATC CCGCCATGCC TGGCTAATTT 6 00 
TTGGGTATTT TTACTAGAGA CAGGGTTTTG GCATGTTGTC CAGGCTGGTC TTGGACTCCT 660 
GATCTCAGAT GATCCTCCTG GCTCGGGCTC CCAAAGTGCT GGGATTACAG GCATGAACCA 72 0 
CCACACATGG CCTAAAAATT GATTCTTATG ATTAATCTCC TGTGAACAAT TTGGCTTCAT 7 80 
TTGAAAGTTT GCCTTCATTT GAAACCTTCA TTTAAAAGCC TGAGCAACAA AGTGAGACCC 84 0 
CATCTCTACA AAAAACTGCA AAATATCCTG TGGACACCTC CTACCTTCTG TGGAGGCTGA 9 00 
AGCAGGAGGA TCACTTGAGC CTAGGAATTT GAGCCTGCAG TGAGCTATGA TCCCACCCCT 96 0 
ACACTCCAGC CTGCATGACA GTAGACCCTG ACACACACAC ACAAAAAAAA ACCTTCATAA 10 2 0 
AAAATTATTA GTTGACTTTT CTTAGGTGAC TTTCCGTTTA AGCAATAAAT TTAAAAGTAA 108 0 
AATCTCTAAT TTTAGAAAAT TTATTTTTAG TTACATATTG AAATTTTTAA ACCCTAGGTT 114 0 
TAAGTTTTAT GTCTAAATTA CCTGAGAACA CACTAAGTCT GATAAGCTTC ATTTTATGGG 12 00 
CCTTTTGGAT GATTATATAA TATTCTGATG AAAGCCAAGA CAGACCCTTA AACCATAAAA 12 6 0 
ATAGGAGTTC GAGAAAGAGG AGTAGCAAAA GTAAAAGCTA GAATGAGATT GAATTCTGAG 132 0 
TCGAAATACA AAATTTTACA TATTCTGTTT CTCTCTTTTT CCCCCTCTTA G 13 71 



35 (13) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3383 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: Genomic DNA 

(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 
(F) TISSUE TYPErplacenta 
(iX) FEATURE: 

(A) NAME /KEY : intron 
45 (B) LOCATION: 1. .3383 

(C) IDENTIFICATION METHODS : E 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:12: 

GTAAAGTAGA AATGAATTTA TTTTTCTTTG CAAACTAAGT ATCTGCTTGA GACACATCTA 6 0 

TCTCACCATT GTCAGCTGAG GAAAAAAAAA AATGGTTCTC ATGCTACCAA TCTGCCTTCA 120 

AAGAAATGTG GACTCAGTAG CACAGCTTTG GAATGAAGAT GATCATAAGA GATACAAAGA 18 0 

AGAACCTCTA GCAAAAGATG CTTCTCTATG CCTTAAAAAA TTCTCCAGCT CTTAGAATCT 24 0 

ACAAAATAGA CTTTGCCTGT TTCATTGGTC CTAAGATTAG CATGAAGCCA TGGATTCTGT 3 00 

TGTAGGGGGA GCGTTGCATA GGAAAAAGGG ATTGAAGCAT TAGAATTGTC CAAAATCAGT 360 

AACACCTCCT CTCAGAAATG CTTTGGGAAG AAGCCTGGAA GGTTCCGGGT TGGTGGTGGG 4 20 

55 



22 



EP 0 816 499 A2 



10 



GTGGGGCAGA AAATTCTGGA AGTAGAGGAG ATAGGAATGG GTGGGGCAAG AAGACCACAT 4 80 
TCAGAGGCCA AAAGCTGAAA GAAACCATGG CATTTATGAT GAATTCAGGG TAATTCAGAA 54 0 
TGGAAGTAGA GTAGGAGTAG GAGACTGGTG AGAGGAGCTA GAGTGATAAA CAGGGTGTAG 6 00 
AGCAAGACGT TCTCTCACCC CAAGATGTGA AATTTGGACT TTATCTTGGA GATAATAGGG 66 0 
TTAATTAAGC ACAATATGTA TTAGCTAGGG TAAAGATTAG TTTGTTGTAA CAAAGACATC 72 0 
CAAAGATACA GTAGCTGAAT AAGATAGAGA ATTTTTCTCT CAAAGAAAGT CTAAGTAGGC 780 
AGCTCAGAAG TAGTATGGCT GGAAGCAACC TGATGATATT GGGACCCCCA ACCTTCTTCA 84 0 
GTCTTGTACC CATCATCCCC TAGTTGTTGA TCTCACTCAC ATAGTTGAAA ATCATCATAC 900 
TTCCTGGGTT CATATCCCAG TTATCAAGAA AGGGTCAAGA GAAGTCAGGC TCATTCCTTT 960 
CAAAGACTCT AATTGGAAGT TAAACACATC AATCCCCCTC ATATTCCATT GACTAGAATT 10 2 C 
TAATCACATG GCCACACCAA GTGCAAGGAA ATCTGGAAAA TATAATCTTT ATTCCAGGTA 10 8 0 
GCCATATGAC TCTTTAAAAT TCAGAAATAA TATATTTTTA AAATATCATT CTGGCTTTGG 114 0 
TATAAAGAAT TGATGGTGTG GGGTGAGGAG GCCAAAATTA AGGGTTGAGA GCCTATTATT 12 0 0 
TTAGTTATTA CAAGAAATGA TGGTGTCATG AATTAAGGTA GACAT AGGGG AGTGCTGATG 12 6 0 
AGGAGCTGTG AATGGATTTT AGAAACACTT G AG AG AAT C A ATAGGACATG ATTTAGGGTT 13 2 0 
75 GGATTTGGAA AGGAGAAGAA AGTAGAAAAG ATGATGCCTA CATTTTTCAC TTAGGCAATT 13 80 

TGTACCATTC AGTGAAATAG GGAACACAGG AGGAAGAGCA GGTTTTGGTG TATACAAAGA 144 0 
GGAGGATGGA TGACGCATTT CGTTTTGGAT CTGAGATGTC TGTGGAACGT CCTAGTGGAG 150 0 
ATGTCCACAA ACTCTTCTAC ATGTGGTTCT GAGTTCAGGA CACAGATTTG GGCTGGAGAT 15 60 
AGAGATATTG TAGGCTTATA CATAGAAATG GCATTTGAAT CTATAGAGAT AAAAAGACAC 16 2 0 
ATCAGAGGAA ATGTGTAAAG TGAGAGAGGA AAAGCCAAGT ACTGTGCTGG GGGGAATACC 168 0 
20 TACATTTAAA GGATGCAGTA GAAAGAAGCT AATAAACAAC AGAGAGCAGA CTAACCAAAA 174 0 

GGGGAGAAGA AAAACCAAGA GAATTCCACC GACTCCCAGG AGAGCATTTC AAGATTGAGG 18 0 0 
GGATAGGTGT TGTGTTGAAT TTTGCAGCCT TGAGAATCAA GGGCCAGAAC ACAGCTTTTA 1860 
GATTTAGCAA CAAGGAGTTT GGTGATCTCA GTGAAAGCAG CTTGATGGTG AAATGGAGGC 192 0 
AGAGGCAGAT TGCAATGAGT GAAACAGTGA ATGGGAAGTG AAGAAATGAT ACAGATAATT 198 0 
CTTGCTAAAA GCTTGGCTGT TAAAAGGAGG AGAGAAACAA GACTAGCTGC AAAGTGAGAT 2 04 0 
25 TGGGTTGATG GAGCAGTTTT AAATCTCAAA ATAAAGAGCT TTGTGCTTTT TTGATTATGA 2100 

AAATAATGTG TTAATTGTAA CTAATTGAGG CAATGAAAAA AGATAATAAT ATGAAAGATA 216 0 
AAAATATAAA AACCACCCAG AAATAATGAT AGCTACCATT TTGATACAAT ATTTCTACAC 222 0 
TCCTTTCTAT GTATATATAC AGACACAGAA ATGCTTATAT TTTTATTAAA AGGGATTGTA 2 28 0 
CTATACCTAA GCTGCTTTTT CTAGTTAGTG ATATATATGG ACATCTCTCC ATGGCAACGA 234 0 
GTAATTGCAG TTATATTAAG TTCATGATAT TTCACAATAA GGGCATATCT TTGCCCTTTT 24 00 
TATTTAATCA ATTCTTAATT GGTGAATGTT TGTTTCCAGT TTGTTGTTGT TATTAACAAT 24 60 
GTTCCCATAA GCATTCCTGT ACACCAATGT TCACACATTT GTCTGATTTT TTCTTCAGGA 2 52 0 
TAAAACCCAG GAGGTAGAAT TGCTGGGTTG ATAGAAGAGA AAGGATGATT GCCAAATTAA 258 0 
AGCTTCAGTA GAGGGTACAT GCCGAGCACA AATGGGATCA GCCCTAGATA CCAGAAATGG 264 0 
CACTTTCTCA TTTCCCCTTG GGACAAAAGG GAGAGAGGCA ATAACTGTGC TGCCAGAGTT 27 0 0 
35 AAATTTGTAC GTGGAGTAGC AGGAAATCAT TTGCTGAAAA TGAAAACAGA GATGATGTTG 276 0 

TAGAGGTCCT GAAGAGAGCA AAGAAAATTT GAAATTGCGG CTATCAGCTA TGGAAGAGAG 2820 
TGCTGAACTG GAAAACAAAA GAAGTATTGA CAATTGGTAT GCTTGTAATG GCACCGATTT 2880 
GAACGCTTGT GCCATTGTTC ACCAGCAGCA CTCAGCAGCC AAGTTTGGAG TTTTGTAGCA 2 940 
GAAAGACAAA TAAGTTAGGG ATTTAATATC CTGGCCAAAT GGTAGACAAA ATGAACTCTG 3000 
AGATCCAGCT GCACAGGGAA GGAAGGGAAG ACGGGAAGAG GTTAGATAGG AAATACAAGA 30 60 
40 GTCAGGAGAC TGGAAGATGT TGTGATATTT AAGAACACAT AGAGTTGGAG TAAAAGTGTA 3L2 0 

AGAAAACTAG AAGGGTAAGA GACCGGTCAG AAAGTAGGCT ATTTGAAGTT AACACTTCAG 3180 
AGGCAGAGTA GTTCTGAATG GTAACAAGAA ATTGAGTGTG CCTTTGAGAG TAGGTTAAAA 324 0 
AACAATAGGC AACTTTATTG TAGCTACTTC TGGAACAGAA GATTGTCATT AATAGTTTTA 33 0 0 
GAAAACTAAA ATATATAGCA TACTTATTTG TCAATTAACA AAGAAACTAT GTATTTTTAA 33 6 0 
ATGAGATTTA ATGTTTATTG TAG 3 3 83 

45 



30 



(14) INFORMATION FOR SEQ ID NO: 13: 
(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11464 base pairs 

(B) TYPE:nucleic acid 

(C) STRANDEDNESS : double 

( D ) TOPOLOGY : 1 inea r 

( i i ) MOLECULE TYPE : Genomic DNA 
(vi) ORIGINAL SOURCE: 

(A) ORGANISM: human 
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(F)TISSUE TYPE:placenta 
(iX) FEATURE: 

(A) NAME/KEY : 5 ' UTR 

(B) LOCATION: 1. .3 

(C) IDENTIFICATION METHODS : E 

(A) NAME/KEY: leader peptide 

(B) LOCATION: 4 . .82 

(C) IDENTIFICATION METHODS : S 

(A) NAME/KEY : intron 

(B) LOCATION: 83. .1453 

(C) IDENTIFICATION METHODS : E 

(A) NAME/ KEY : leader peptide 

(B) LOCATION: 1454 . .1465 

(C) IDENTIFICATION METHODS : S 

(A) NAME/ KEY : intron 

(B) LOCATION: 1466 . .4848 

(C) IDENTIFICATION METHODS : E 

(A) NAME/KEY: leader peptide 

(B) LOCATION: 4849. .4865 

(C) IDENTIFICATION METHODS : S 

(A) NAME/KEY: mat peptide 

(B) LOCATION: 486 6. .4983 

(C) IDENTIFICATION METHODS : S 

(A) NAME/KEY: intron 

(B) LOCATION : 4 984 . .6317 

(C) IDENTIFICATION METHODS : E 

(A) NAME/KEY: mat peptide 

(B) LOCATION: 6318 . .6451 

(C) IDENTIFICATION METHODS : S 

(A) NAME/KEY : intron 

(B) LOCATION: 6452. .11224 

(C) IDENTIFICATION METHODS : E 

(A) NAME/ KEY : mat peptide 

(B) LOCATION: 1122 5. .11443 

(C) IDENTIFICATION METHODS : S 

(A) NAME/ KEY : 3 ' UTR 

(B) LOCATION: 11444 . . 11464 

(C) IDENTIFICATION METHODS : E 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

AAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC AAC TTT GTG GCA 
48 

Met Ala Ala Glu Pro Val Glu Asp Asn Cys He Asn Phe Val Ala 
-35 -30 -25 

ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA G GTAAGG CTAATGCCAT 
98 

Met Lys Phe He Asp Asn Thr Leu Tyr Phe He Ala 
-20 -15 -10 

AGAACAAATA CCAGGTTCAG ATAAATCTAT TCAATTAGAA AAGATGTTGT GAGGTGAACT 
158 

ATTAAGTGAC TCTTTGTGTC ACCAAATTTC ACTGTAATAT TAATGGCTCT TAAAAAAATA 
218 

GTGGACCTCT AGAAATTAAC CACAACATGT CCAAGGTCTC AGCACCTTGT CACACCACGT 
278 

GTCCTGGCAC TTTAATCAGC AGTAGCTCAC TCTCCAGTTG GCAGTAAGTG C AC AT C ATG A 
338 

AAATCCCAGT TTTCATGGGA AAATCCCAGT TTTCATTGGA TTTCCATGGG AAAAATCCCA 
398 

GTACAAAACT GGGTGCATTC AGGAAATACA ATTTCCCAAA GCAAATTGGC AAATTATGTA 
458 

AGAGATTCTC TAAATTTAGA GTTCCGTGAA TTACACCATT TTATGTAAAT ATGTTTGACA 
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518 

AGTAAAAATT GATTCTTTTT TTTTTTTTCT GTTGCCCAGG CTGGAGTGCA GTGGCAC AAT 
578 

CTCTGCTCAC TGCAACCTCC ACCTCCTGGG TTCAAGCAAT TCTCCTGCCT CAGCCTTCTG 
638 

AGTAGCTGGG ACTACAGGTG CATCCCGCCA TGCCTGGCTA ATTTTTGGGT ATTTTTACTA 
698 

GAGACAGGGT TTTGGCATGT TGTCCAGGCT GGTCTTGGAC TCCTGATCTC AGATGATCCT 
758 

0 CCTGGCTCGG GCTCCCAAAG TGCTGGGATT ACAGGCATGA ACCACCACAC ATGGCCTAAA 

818 

AATTGATTCT TATGATTAAT CTCCTGTGAA CAATTTGGCT TCATTTGAAA GTTTGCCTTC 
878 

ATTTGAAACC TT C ATTT AAA AGCCTGAGCA ACAAAGTGAG ACCCCATCTC TACAAAAAAC 
938 

15 TGCAAAATAT CCTGTGGACA CCTCCTACCT TCTGTGGAGG CTGAAGCAGG AGGATCACTT 

998 

GAGCCTAGGA ATTTGAGCCT GCAGTGAGCT ATGATCCCAC CCCTACACTC CAGCCTGCAT 
1058 

GACAGTAGAC CCTGACACAC ACACACAAAA AAAAACCTTC ATAAAAAATT ATTAGTTGAC 
1118 

20 TTTTCTTAGG TGACTTTCCG TTTAAGCAAT AAATTTAAAA GTAAAATCTC TAATTTTAGA 

1178 

AAATTT ATT T TTAGTTACAT ATTGAAATTT TTAAACCCTA GGTTTAAGTT TTATGTCTAA 
1238 

ATTACCTGAG AACACACTAA GTCTGATAAG CTTCATTTTA TGGGCCTTTT GGATGATTAT 
1298 

AT AAT ATT CT GATGAAAGCC AAGACAGACC CTTAAACCAT AAAAATAGGA GTTCGAGAAA 
1358 

GAGGAGTAGC AAAAGTAAAA GCTAGAATGA GATTGAATTC TGAGTCGAAA TACAAAATTT 
1418 

TACATATTCT GTTTCTCTCT TTTTCCCCCT CTTAG CT GAA GAT GAT G GTAAA 

1470 

30 

Ala Glu Asp Asp Glu 
-10 

GTAGAAATGA ATTTATTTTT CTTTGCAAAC TAAGTATCTG CTTGAGACAC ATCTATCTCA 

1530 

CCATTGTCAG CTGAGGAAAA AAAAAAATGG TTCTCATGCT ACCAATCTGC CTTCAAAGAA 
1590 

ATGTGGACTC AGTAGCACAG CTTTGGAATG AAGATGATCA TAAGAGATAC AAAGAAGAAC 
1650 

CTCTAGCAAA AGATGCTTCT CTATGCCTTA AAAAATTCTC CAGCTCTTAG AAT CT ACAAA 
1710 

ATAGACTTTG CCTGTTTCAT TGGTCCTAAG ATTAGCATGA AGCCATGGAT TCTGTTGTAG 
40 1770 

GGGGAGCGTT G C AT AGGAAA AAGGGATTGA AG C ATT AG AA TTGTCCAAAA TCAGTAACAC 
1830 

CTCCTCTCAG AAATGCTTTG GGAAGAAGCC TGGAAGGTTC CGGGTTGGTG GTGGGGTGGG 
1890 

GCAGAAAATT CTGGAAGTAG AGGAGATAGG AATGGGTGGG GCAAGAAGAC CACATTCAGA 
45 1950 

GGCCAAAAGC TGAAAGAAAC CATGGCATTT ATGATGAATT CAGGGTAATT CAGAATGGAA 
2010 

GTAGAGTAGG AGTAGGAGAC TGGTGAGAGG AGCTAGAGTG ATAAACAGGG TGTAGAGCAA 
2070 

GACGTTCTCT CACCCCAAGA TGTGAAATTT GGACTTTATC TTGGAGATAA TAGGGTTAAT 
50 2130 

TAAGCACAAT ATGTATTAGC TAGGGTAAAG ATTAGTTTGT TGTAACAAAG ACATCCAAAG 
2190 

ATACAGTAGC TGAATAAGAT AGAGAATTTT TCTCTCAAAG AAAGTCTAAG TAGGCAGCTC 
2250 
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AGAAGTAGTA TGGCTGGAAG CAACCTGATG ATATTGGGAC CCCCAACCTT CTTCAGTCTT 
2310 

GTACCCATCA TCCCCTAGTT GTTGATCTCA CTCACATAGT TGAAAATCAT CATACTTCCT 
2370 

GGGTTCATAT CCCAGTTATC AAGAAAGGGT CAAGAGAAGT CAGGCTCATT CCTTTCAAAG 
2430 

ACTCTAATTG GAAGTTAAAC ACATCAATCC CCCTCA7ATT CCATTGACTA GAATTTAATC 
2490 

ACATGGCCAC ACCAAGTGCA AGGAAATCTG GAAAATATAA TCTTTATTCC AGGTAGCCAT 
2550 

ATGACTCTTT AAAATTCAGA AATAATATAT TTTTAAAATA TCATTCTGGC TTTGGTATAA 
2610 

AGAATTGATG GTGTGGGGTG AGGAGGCCAA AATTAAGGGT TGAGAGCCTA TTATTTTAGT 
2670 

TATTACAAGA AATGATGGTG TCATGAATTA AGGTAGACAT AGGGGAGTGC TGATGAGGAG 
2730 

CTGTGAATGG ATTTTAGAAA CACTTGAGAG AATCAATAGG ACATGATTTA GGGTTGGATT 
2790 

TGGAAAGGAG AAG AAAG TAG AAAAGATGAT GCCTACATTT TTCACTTAGG CAATTTGTAC 
2850 

CATTCAGTGA AATAGGGAAC ACAGGAGGAA GAGCAGGTTT TGGTGTATAC AAAGAGGAGG 
2910 

ATGGATGACG CATTTCGTTT TGGATCTGAG ATGTCTGTGG AACGTCCTAG TGGAGATGTC 
2970 

CACAAACTCT TCTACATGTG GTTCTGAGTT CAGGACACAG ATTTGGGCTG GAGATAGAGA 
3030 

TATTGTAGGC TTATACATAG AAATGGCATT TGAATCTATA GAGATAAAAA GACACATCAG 
3090 

AGGAAATGTG TAAAGTGAGA GAGGAAAAGC CAAGTACTGT GCTGGGGGGA ATACCTACAT 
3150 

TTAAAGGATG CAGTAGAAAG AAGCTAATAA ACAACAGAGA GCAGACTAAC CAAAAGGGGA 
3210 

GAAGAAAAAC CAAGAGAATT CCACCGACTC CCAGGAGAGC ATTTCAAGAT TGAGGGGATA 

3 2 7 0 

GGTGTTGTGT TGAATTTTGC AGCCTTGAGA ATCAAGGGCC AGAACACAGC TTTTAGATTT 
3330 

AGCAACAAGG AGTTTGGTGA TCTCAGTGAA AGCAGCTTGA TGGTGAAATG GAGGCAGAGG 
3390 

CAGATTGCAA TGAGTGAAAC AGTGAATGGG AAGTGAAGAA ATGATACAGA TAATTCTTGC 
3450 

TAAAAGCTTG GCTGTTAAAA GGAGGAGAGA AACAAGACTA GCTGCAAAGT GAGATTGGGT 
3510 

TGATGGAGCA GTTTTAAATC TCAAAATAAA GAGCTTTGTG CTTTTTTGAT TATGAAAATA 
3570 

ATGTGTTAAT TGTAACTAAT TGAGGCAATG AAAAAAGATA ATAATATGAA AGATAAAAAT 

3630 

ATAAAAACCA CCCAGAAATA ATGATAGCTA CCATTTTGAT ACAATATTTC TACACTCCTT 
3690 

TCTATGTATA TATACAGACA CAGAAATGCT TATATTTTTA TTAAAAGGGA TTGTACTATA 
3750 

CCTAAGCTGC TTTTTCTAGT TAGTGATATA TATGGACATC TCTCCATGGC AACGAGTAAT 
3810 

TGCAGTTATA TTAAGTTCAT GATATTTCAC AATAAGGGCA TATCTTTGCC CTTTTTATTT 
3870 

AATCAATTCT TAATTGGTGA ATGTTTGTTT CCAGTTTGTT GTTGTTATTA ACAATGTTCC 
3930 

CATAAGCATT CCTGTACACC AATGTTCACA CATTTGTCTG ATTTTTTCTT CAGGATAAAA 
3990 

CCCAGGAGGT AGAATTGCTG GGTTGATAGA AGAGAAAGGA TGATTGCCAA ATTAAAGCTT 
4050 

CAGTAGAGGG TACATGCCGA GCACAAATGG GATCAGCCCT AG AT AC CAGA AATGGCACTT 
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4110 

TCTCATTTCC CCTTGGGACA AAAGGGAGAG AGGCAATAAC 7GTGCTGCCA GAGTTAAATT 
4170 

TGTACGTGGA GTAGCAGGAA ATCATTTGCT GAAAATGAAA ACAGAGATGA TGTTGTAGAG 
4230 

GTCCTGAAGA GAGCAAAGAA AATTTGAAAT TGCGGCTATC AGCTATGGAA GAGAGTGCTG 
4290 

AACTGGAAAA CAAAAGAAGT ATTGACAATT GGTATGCTTG TAATGGCACC GATTTGAACG 
4350 

10 CTTGTGCCAT TGTTCACCAG CAGCACTCAG CAGCCAAGTT TGGAGTTTTG TAGCAGAAAG 

4410 

ACAAATAAGT TAGGGATTTA ATATCCTGGC CAAATGGTAG ACAAAATGAA CTCTGAGATC 
4470 

CAGCTGCACA GGGAAGGAAG GGAAGACGGG AAGAGGTTAG ATAGGAAATA CAAGAGTCAG 
4530 

15 GAGACTGGAA GATGTTGTGA TATTTAAGAA CACATAGAGT TGGAGTAAAA GTGTAAGAAA 

4590 

ACTAGAAGGG TAAGAGACCG GTCAGAAAGT AGGCTATTTG AAGTTAACAC TTCAGAGGCA 
4650 

GAGTAGTTCT GAATGGTAAC AAGAAATTGA GTGTGCCTTT GAGAGTAGGT TAAAAAACAA 

4710 

20 TAGGCAACTT TATTGTAGCT ACTTCTGGAA CAGAAGATTG TCATTAATAG TTTTAGAAAA 

4770 

CTAAAATATA TAGCATACTT ATTTGTCAAT TAACAAAGAA AC TAT GT ATT TTTAAATGAG 
4830 



25 



30 



ATTTAATGTT 


TATTGTAG AA AAC 


CTG 


GAA 


TCA GAT 


TAC 


TTT GGC 


AAG 


CTT 


4880 
























Glu Asn 
-5 


Leu 


Glu 


Ser Asp 


Tyr 
1 


Phe Gly 


Lys 


Leu 


GAA TCT 
4928 
Glu Ser 


AAA 


TTA TCA GTC ATA 


AGA 


AAT 


TTG AAT 


GAC 


CAA GTT 


CTC 


5 

TTC 


Lys 


Leu Ser Val He 
10 


Arg 


Asn 


Leu v Asn 
15 


Asp 


Gin Val 


Leu 
20 


Phe 


ATT GAC 


CAA 


GGA AA.T CGG CCT 


CTA 


TTT 


GAA GAT 


ATG 


ACT GAT 


TCT 


GAC 


4376 




















He Asp 


Gin 


Gly Asn Arg Pro 


Leu 


Phe 


Glu Asp 


Met 


Thr Asp 


Ser 


Asp 






25 




30 






35 




TGT AGA 


G 


GTATTTTTT TTAATTCGCA AACATAGAAA 


TGACTAGCTA CTTCTTCCCA 



Cys Arg Asp 
40 

TTCTGTTTTA CTGCTTACAT TGTTCCGTGC TAGTCCCAAT CCTCAGATGA AAAGT C AC AG 

5092 

GAGTGACAAT AATTTCACTT ACAGGAAACT TTATAAGGCA TCCACGTTTT TTAGTTGGGG 
40 5152 

TAAAAAATTG GATACAATAA GACATTGCTA GGGGTCATGC CTCTCTGAGC CTGCCTTTGA 
5212 

ATCACCAATC CCTTTATTGT GATTGCATTA ACTGTTTAAA ACCTCTATAG TTGGATGCTT 
5272 

AATCCCTGCT TGTTACAGCT GAAAATGCTG ATAGTTTACC AGGTGTGGTG GCATCTATCT 
^5 5332 

GT AAT CCT AG CTACTTGGGA GGCTCAAGCA GGAGGATTGC TTGAGGCCAG GACTTTGAGG 
5392 

CTGTAGTACA CTGTGATCGT ACCTGTGAAT AGCCACTGCA CTCCAGCCTG GGTGATATAC 
5452 

AGACCTTGTC TCTAAAATTA AAAAAAAAAA AAAAAAAAAC CTTAGGAAAG GAAATTGATC 
50 5512 

AAGTCTACTG TGCCTTCCAA AACATGAATT CCAAATATCA AAGTTAGGCT GAGTTG AAG C 
5572 

AGTGAATGTG CATTCTTTAA AAATACTGAA TACTTACCTT AACATATATT TTAAATATTT 
5632 
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TATTTAGCAT TTAAAAGTTA AAAACAATCT TTTAGAATTC ATATCTTTAA AAT AC T C AAA 
5692 

AAAGTTGCAG CGTGTGTGTT GTAATACACA TTAAACTGTG GGGTTGTTTG TTTGTTTGAG 
5752 

ATGCAGTTTC ACTCTGTCAC CCAGGCTGAA GTGCAGTGCA GTGCAGTGGT GTGATCTCGG 
5812 

CTCACTACAA CCTCCACCTC CCACGTTCAA GCGATTCTCA TGCCTCAGTC TCCCGAGTAG 
5872 

GTGGGATTAC AGGCATGCAC CACTTACACC CGGCTAATTT TTGTATTTTT AGTAGAGCTG 
5932 

GGGTTTCACC ATGTTGGCCA GGCTGGTCTC AAACCCCTAA CCTCAAGTGA TCTGCCTGCC 
5992 

TCAGCCTCCC AAACAAACAA ACAACCCCAC AGTTTAATAT GTGTTACAAC ACACATGCTG 
6 052 

CAACTTTTAT GAGTATTTTA ATGATATAGA TTATAAAAGG TTGTTTTTAA CTTTTAAATG 
6112 

CTGGGATTAC AGGCATGAGC CACTGTGCCA GGCCTGAACT GTGTTTTTAA AAATGTCTGA 
6172 

CCAGCTGTAC ATAGTCTCCT GCAGACT GGC CAAGTCTCAA AGTGGGAACA GGTGTATTAA 
6232 

GGACTATCCT TTGGTTAAAT TTCCGCAAAT GTTCCTGTGC AAGAATTCTT CTAACTAGAG 
6292 



TTCTCATTTA TTATATTTAT TTCAG 


AT 


AAT 


GCA 


CCC CGG 


ACC 


ATA 


TTT 


ATT 


6343 
































Asp 


Asn 


Ala 


Pro Arg 


Thr 


He 


Phe 


He 










40 








45 








ATA AGT 


ATG TAT 


AAA GAT 


AGC 


CAG 


CCT 


AGA 


GGT ATG 


GCT 


GTA 


ACT 


ATC 


6391 
























lie Ser 


Met Tyr 


Lys Asp 


Ser 


Gin 


Pro 


Arg 


Gly Met 


Ala 


Val 


Thr 


He 


50 
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60 










TCT GTG 


AAG TGT 


GAG AAA 


ATT 


TCA 


ACT 


CTC 


TCC TGT 


GAG 


AAC 


AAA 


ATT 


6439 
























Ser Val 


Lys Cys 


Glu Lys 


lie 


Ser 


Thr 


Leu 


Ser Cys 


Glu 


Asn 


Lys 


He 


6 5 




70 










n 5 









ATT TCC TTT AAG GTAAG ACTGAGCCTT ACTTTGTTTT CAATCATGTT AAT AT AAT C A 
6496 

He Ser Phe Lys 

ATATAATTAG AAATATAACA TTATTTCTAA TGTTAATATA AGTAATGTAA TTAGAAAACT 
6556 

CAAATATCCT CAGACCAACC TTTTGTCTAG AACAGAAATA ACAAGAAGCA GAGAACCATT 
6616 

AAAGTGAATA CTTACTAAAA ATTATCAAAC TCTTTACCTA TTGTGATAAT GATGGTTTTT 
6676 

CTGAGCCTGT CACAGGGGAA GAGGAGATAC AACACTTGTT TTATGACCTG CATCTCCTGA 
6736 

ACAATCAGTC TTTATACAAA TAATAATGTA GAATACATAT GTGAGTTATA CATTTAAGAA 
6796 

TAACATGTGA CTTTCCAGAA TGAGTTCTGC TATGAAGAAT GAAGCTAATT ATCCTTCTAT 
6856 

ATTTCTACAC CTTTGTAAAT TATGATAATA TTTTAATCCC TAGTTGTTTT GTTGCTGATC 
6916 

CTTAGCCTAA GTCTTAGACA CAAGCTTCAG CTTCCAGTTG ATGTATGTTA TTTTTAATGT 
6976 

TAATCTAATT GAATAAAAGT TATGAGATCA GCTGTAAAAG TAATGCTATA ATTATCTTCA 
7036 

AGCCAGGTAT AAAGTATTTC TGGCCTCTAC TTTTTCTCTA TTATTCTCCA TTATTATTCT 
7096 

CTATTATTTT TCTCTATTTC CTCCATTATT GTTAGATAAA CCACAATTAA CTATAGCTAC 
7156 

AGACTGAGCC AGTAAGAGTA GCCAGGGATG CTTACAAATT GGCAATGCTT CAGAGGAGAA 
7216 
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TTCCATGTCA TGAAGACTCT TTTTGAGTGG AGATTTGCCA AT AAAT AT C C GCTTTCATGC 
7276 

CCACCCAGTC CCCACTGAAA GACAGTTAGG ATATGACCTT AGTGAAGGTA CCAAGGGGCA 
5 7336 

ACTTGGTAGG GAGAAAAAAG CCACTCTAAA ATATAATCCA AGTAAGAACA GTGCATATGC 
7396 

AACAGATACA GCCCCCAGAC AAATCCCTCA GCTATCTCCC TCCAACCAGA GTGCCACCCC 
7456 

TTCAGGTGAC AATTTGGAGT CCCCATTCTA GACCTGACAG GCAGCTTAGT TATCAAAATA 
10 7516 

GCATAAGAGG CCTGGGATGG AAGGGTAGGG TGGAAAGGGT TAAGCATGCT GTTACTGAAC 
7576 

AACATAATTA GAAGGGAAGG AGATGGCCAA GCTCAAGCTA TGTGGGATAG AGGAAAACTC 
7636 

AGCTGCAGAG GCAGATTCAG AAACTGGGAT AAGTCCGAAC CTACAGGTGG ATTCTTGTTG 
15 7696 

AGGGAGACTG GTGAAAATGT TAAGAAGATG GAAATAATGC TTGGCACTTA GTAGGAACTG 
7756 

GGCAAATCCA TATTTGGGGG AGCCTGAAGT TTATTCAATT TTGATGGCCC TTTTAAATAA 
7816 

AAAGAATGTG GCTGGGCGTG GTGGCTCACA CCTGTAATCC CAGCACTTTG GGAGGCCGAG 
20 ISIS 

GGGGGCGGAT CACCTGAAGT CAGGAGTTCA AGACCAGCCT GACCAACATG GAGAAACCCC 
7936 

ATCTCTACTA AAAATACAAA ATTAGCTGGG CGTGGTGGCA TATGCCTGTA ATCCCAGCTA 
7996 

25 CTCGGGAGGC TGAGGCAGGA GAATCTTTTG AACCCGGGAG GCAGAGGTTG CGATGAGCCT 
8056 

AGATCGTGCC ATTGCACTCC AGCCTGGGCA ACAAGAGCAA AACTCGGTCT CAAAAAAAAA 
8116 

AAAAAAAAAG TGAAATTAAC CAAAGGCATT AGCTTAATAA TTTAATACTG TTTTTAAGTA 
8176 

30 GGGCGGGGGG TGGCTGGAAG AGATCTGTGT AAATGAGGGA ATCTGACATT TAAGCTTCAT 

92? $ 

CAGCATCATA GCAAATCTGC TTCTGGAAGG AACTCAATAA ATATTAGTTG GAGGGGGGGA 
8296 

GAGAGTGAGG GGTGGACTAG GACCAGTTTT AGCCCTTGTC TTTAATCCCT TTTCCTGCCA 
8356 

35 CTAATAAGGA TCTTAGCAGT GGTTATAAAA GTGGCCTAGG TTCTAGATAA TAAGATACAA 
8416 

CAGGCCAGGC ACAGTGGCTC ATGCCTATAA TCCCAGCACT TTGGGAGGGC AAGGCGAGTG 
8476 

TCTCACTTGA GAT CAGGAGT TCAAGACCAG CCTGGCCAGC ATGGCGATAC TCTGTCTCTA 
8536 

40 CTAAAAAAAA TACAAAAATT AGCCAGGCAT GGTGGCATGC ACCTGTAATC CCAGCTACTC 
8596 

GTGAGCCTGA GGCAGAAGAA TCGCTTGAAA CCAGGAGGTG TAGGCTGCAG TGAGCTGAGA 
8656 

TCGCACCACT GCACTCCAGC CTGGGCGACA GAATGAGACT TTGTCTCAAA AAAAGAAAAA 
8716 

GATACAACAG GCTACCCTTA TGTGCTCACC TTTCACTGTT GATTACTAGC TATAAAGTCC 
8776 

TATAAAGTTC TTTGGTCAAG AACCTTGACA ACACTAAGAG GGATTTGCTT TGAGAGGTTA 
8836 

CTGTCAGAGT CTGTTTCATA TATATACATA TACATGTATA TATGTATCTA TAT CCAGGCT 
so 8896 

TGGCCAGGGT TCCCTCAGAC TTTCCAGTGC ACTTGGGAGA TGTTAGGTCA ATATCAACTT 
8956 

TCCCTGGATT CAGATTCAAC CCCTTCTGAT GTAAAAAAAA AAAAAAAAAA GAAAGAAATC 
9016 

CCTTTCCCCT TGGAGCACTC AAGTTTCACC AGGTGGGGCT TTCCAAGTTG GGGGTTCTCC 

55 



45 
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9076 

AAGGTCATTG GGATTGCTTT CACATCCATT TGCTATGTAC CTTCCCTATG ATGGCTGGGA 
9136 

GTGGTCAACA TCAAAACTAG GAAAGCTACT GCCCAAGGAT GTCCTTACCT CTATTCTGAA 
5 9196 

ATGTGCAATA AGTGTGATTA AAGAGATTGC CTGTTCTACC TATCCACACT CTCGCTT^CA 
9256 

ACTGTAACTT TCTTTTTTTC TTTTTTTCTT TTTTTCTTTT TTTTTGAAAC GGAGTCTCGC 
9316 

TCTGTCGCCC AGGCTAGAGT GCAGTGGCAC GATCTCAGCT CACTGCAAGC TCTGCCTCCC 
10 9376 

GGGTTCACGC CATTCTCCTG CCTCACCCTC CCAAGCAGCT GGGACTACAG GCGCCTGCCA 
9436 

CCATGCCCAG CTAATTTTTT GTATTTTTAG TAGAGACGGG GTTTCACCGT GTTAGCCAGG 
9496 

15 ATGGTCTCGA TCTCCTGAAC TTGTGATCCG CCCGCCTCAG CCTCCCAAAG TGCTGGGATT 

9556 

ACAGGCGTGA GCCATCGCAC CCGGCTCAAC TGTAACTTTC TATACTGGTT CATCTTCCCC 
9616 

TGTAATGTTA CTAGAGCTTT TGAAGTTTTG GCTATGGATT ATTTCTCATT TATACATTAG 
9676 

20 ATTTCAGATT AGTTCCAAAT TGATGCCCAC AGCTTAGGGT CTCTTCCTAA ATTGTATATT 

9736 

GTAGACAGCT GCAGAAGTGG GTGCCAATAG GGGAACTAGT TTATACTTTC ATCAACTTAG 
9796 

GACCCACACT TGTTGATAAA GAACAAAGGT CAAGAGTTAT GACTACTGAT TCCACAACTG 
9856 

25 ATTGAGAAGT TGGAGATAAC CCCGTGACCT CTGCCATCCA GAGTCTTTCA GGCATCTTTG 

9916 

AAGGATGAAG AAATGCTATT TTAATTTTGG AGGTTTCTCT ATCAGTGCTT AGGATCATGG 
9976 

GAATCTGTGC TGCCATGAGG CCAAAATTAA GTCCAAAACA TCTACTGGTT CCAGGATTAA 
10036 

30 CATGGAAGAA CCTTAGGTGG TGCCCACATG TTCTGATCCA TCCTGCAAAA TAGACATGCT 

10096 

GCACTAACAG GAAAAGTGCA GGCAGCACTA CCAGTTGGAT AACCTGCAAG ATTATAGTTT 
10156 

CAAGTAATCT AACCATTTCT CACAAGGCCC TATTCTGTGA CTGAAACATA CAAGAATCTG 
10216 

55 CATTTGGCCT TCTAAGGCAG GGCCCAGCCA AGGAGACCAT ATTCAGGACA GAAATTCAAG 

10276 

ACTACTATGG AACTGGAGTG CTTGGCAGGG AAGACAGAGT CAAGGACTGC CAACTGAGCC 
10336 

AATACAGCAG GCTTACACAG GAACCCAGGG CCTAGCCCTA CAACAATTAT TGGGTCTATT 
40 10396 

CACTGTAAGT TTTAATTTCA GGCTCCACTG AAAGAGTAAG CTAAGATTCC TGGCACTTTC 
10456 

TGTCTCTCTC ACAGTTGGCT CAGAAATGAG AACTGGTCAG GCCAGGCATG GTGGCTTACA 
10516 

CCTGGAATCC CAGCACTTTG GGAGGCCGAA GTGGGAGGGT CACTTGAGGC CAGGAGTTCA 
45 10576 

GGACCAGCTT AGG C AAC AAA GTGAGATACC CCCTGACCCC TTCTCTACAA AAATAAATTT 
10636 

TAAAAATTAG CCAAATGTGG TGGTGTATAC TTACAGTCCC AGCTACTCAG GAGGCTGAGG 
10696 

CAGGGGGATT GCTTGAGCCC AGGAATTCAA GGCTGCAGTG AGCTATGATT TCACCACTGC 
50 1 0 7 5 6 

ACTTCTGGCT GGGCAACAGA GCGAGACCCT GTCTCAAAGC AAAAAGAAAA AGAAACTAGA 
10816 

ACTAGCCTAA GTTTGTGGGA GGAGGTCATC ATCGTCTTTA GCCGTGAATG GTTATTATAG 
10876 
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AGGACAGAAA TTGACATTAG CCCAAAAAGC TTGTGGTCTT TGCTGGAACT CTACTTAATC 
10936 

TTGAGCAAAT GTGGACACCA CTCAATGGGA GAGGAGAGAA GTAAGCTGTT TGATGTATAG 
10996 

GGGAAAACTA GAGGCCTGGA ACTGAATATG CATCCCATGA CAGGGAGAAT AGGAGATTCG 
11056 

GAGTTAAGAA GGAGAGGAGG TCAGTACTGC TGTTCAGAGA TTTTTTTTAT GTAACTCTTG 
11116 

AGAAGCAAAA CTACTTTTGT TCTGTTTGGT AATATACTTC AAAACAAACT TCATATATTC 
11176 

AAATTGTTCA TGTCCTGAAA TAATTAGGTA ATGTTTTTTT CTCTATAG GAA ATG AAT 
11233 

Glu Met Asn 
85 



20 



CCT CCT 


GAT 


AAC 


ATC 


AAG 


GAT 


ACA 


AAA 


AGT 


GAC 


ATC 


ATA 


TTC 


TTT 


CAG 


11281 




























Pro Pro 


Asp 
90 


Asn 


He 


Lys 


Asp 


Thr 
95 


Lys 


Ser 


Asp 


He 


He 
100 


Phe 


Phe 


Glu 


AGA AGT 


GTC 


CCA 


GGA 


CAT 


GAT 


AAT 


AAG 


ATG 


CAA 


TTT 


GAA 


TCT 


TCA 


TCA 


11329 






























Arg Ser 


Val 


Pro 


Gly 


His 


Asp 


Asn 


Lys 


Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


105 










110 










115 










TAC GAA 


GGA 


TAC 


TTT 


CTA 


GCT 


TGT 


GAA 


AAA 


GAG 


AGA 


GAC 


CTT 


TTT 


AAA 


11377 






























Tyr Glu 


Gly 


Tyr 


Phe 


Leu 


Ala 


Cys 


Glu 


Lys 


Glu 


Arg Asp 


Leu 


Phe 


Lys 


120 








125 










130 










135 


CTC ATT 


TTG 


AAA 


AAA 


GAG 


GAT 


GAA 


TTG 


GGG 


GAT 


AGA 


TCT 


ATA 


ATG 


TTC 


11425 






























Leu lie 


Leu 


Lys 


Lys 


Glu 


Asp 


Glu 


Leu 


Gly 


Asp 


Arg 


Ser 


He 


Met 


Phe 



140 145 150 



ACT GTT CAA AAC GAA GAC TAGCTATTAA AATTTCATGC C 
11464 

Thr Val Gin Asn Glu Asp 

155 



(15) INFORMATION FOR SEQ ID NO: 14: 
35 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 28994 base pairs 

(B) TYPE:nucleic acid 

( C ) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic DNA 
40 (vi) ORIGINAL SOURCE: 

( A ) ORGANI S M : human 
(F) TISSUE TYPE: placenta 
(iX) FEATURE: 

(A) NAME/KEY: 5' UTR 
4S (B) LOCATION:!. .15606 

(C) IDENTIFICATION METHODS : E 

(A) NAME /KEY: leader peptide 

(B) LOCATION: 15607 . .15685 

(C) IDENTIFICATION METHODS : S 
(A) NAME /KEY : intron 

50 (B) LOCATION: 15686 . . 17056 

(C) IDENTIFICATION METHODS : E 

(A) NAME/KEY: leader peptide 

(B) LOCATION: 17057 . . 17068 

(C) IDENTIFICATION METHODS : S 

(A) NAME /KEY : intron 

(B) LOCATION: 17069. .20451 
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10 



15 



20 



25 



30 



35 



40 



45 



SO 



C) IDENTIFICATION METHODS : E 
A) NAME/KEY: leader peptide 
3) LOCATION: 204 52. .20468 
C) IDENTIFICATION METHODS : S 
A) NAME /KEY: mat peptide 
3) LOCATION: 20469. .20586 
C) IDENTIFICATION METHODS : S 

A) NAME/KEY :intron 

B) LOCATION: 20587. .21920 

C) IDENTIFICATION METHODS : E 

A) NAME/KEY: mat peptide 

B) LOCATION: 21921. .22054 

C) IDENTIFICATION METHODS : S 

A) NAME/KEY: intron 

B) LOCATION: 22055. .26827 

C) IDENTIFICATION METHODS : E 

A) NAME/ KEY: mat peptide 

B) LOCATION: 2 6828 . .27046 

C) IDENTIFICATION METHODS : S 

A) NAME/KEY: 3 ' UTR 

B) LOCATION : 2 7047 . .28994 

C) IDENTIFICATION METHODS : E 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14; 



55 



ACTTGCCTTA 
60 

GTTCAAGAAA 
120 

GTTTAGAAAT 
180 

TATAAAATAG 
240 

GTGGACTATC 

300 

GGACATATAC 
360 

ACATGCTAGA 
420 

CCCAAGAAAC 
480 

TTTCATTTGT 
540 

CACAATCTTA 
600 

AGTAGCTAGG 
660 

GTTTTCAGAG 
720 

CGATACTCCT 
780 

GCCTTAAATT 
840 

GAACCAGTAG 
900 

CTTGTGAAAT 
960 

AAGATCAACT 
1020 

AACTATAGTA 
1080 

TCATAAAGGC 



AAAGCTTTGC 
AATCATTTAA 
ATAAACATTT 
TCCGGAAATT 
TGGCACTGGA 
ATTTTGTTTA 
AAGCATATGA 
ACCTTGCTCA 
TTGTTTTTGT 
GCTCACTGTA 
ACTACAGGAA 
ACAATGTATT 
GCCTCAGCCT 
AGACTTTAAA 
ATGTTTTCAT 
TTTGCTAAAT 
GGTGGGGGCA 
CCTAGTTATC 
ACAGACTCAC 



ATAGGTAGAC 
GTTATAAAAT 
TATACATCAC 
TCAGAGAAAG 
GACTAAATAA 
TTAAGAAAAA 
CTTAGTCATT 
ATATATTAAA 
GACAAGTTCT 
GCCTCCTAGA 
CATTCCACCA 
GCAGCGTTGC 
CCCAAAGCAC 
TGTGGTTTTA 
AGCAATGAAG 
AATATAATCT 
GTAGTAAAAG 
TTACTTATCA 
TTCTGTCTCT 



AACATTAGAT 
ATAACAAACC 
CATTTAAATC 
ATGAATCTGA 
AGAAAGCAGG 
GCAAATAAAA 
TGAGTTTTTA 
TTTTATTTTG 
CGCTCTGTCA 
TTCAAGTGAT 
TGCCCAGCTA 
CCAGGCTGAT 
TAGGATTACA 
AACTCCTGTT 
CTAAACTGTA 
TCAAGGGAGC 
ACAGGATACT 
CAGCAAAATA 
AGATCTCAAG 



TAATTTCCTT 
TTCTGCATTA 
TTTCTCCAAG 
TTTTCCAAGA 
TACAGTCAAT 
CATTTTTCAG 
TTATTAAGGA 
GTTTTCAACT 
CCTAGGCCAA 
CCTCCTGTCT 
ATTTTGTTTT 
CTGAAACTCT 
GACATGAGCC 
GAAAAAGCGT 
ATTTAGACAG 
AAATCATGTC 
GTGCTCTTTA 
ATTACATAAA 
CTACCAAAAA 



GCTCACATCT 
TAAGACTGAT 
GCTTCATCTT 
GAGGACAGCT 
AAGATCTTCA 
AAAAAGGCAA 
AATTTACAGG 
AGACTTTGCT 
AGTGTAGTGA 
CAGACTCCTG 
GTTTTGTTTT 
TAGCCTCAAA 
AATGCGCCCA 
CTGGTATCTT 
TAG CCAAATG 
CCAAATGCAA 
AAAGGTCAGT 
ATCCTATGGA 
GAAATCTCCC 
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1140 

AATAGTTTCT TGGAGGCCTA TACTTAGTGA AAAAGCAGCT GGAATCAACA TAGTTCCTCC 
1200 

TATGTTGTAG GACAATCCTA GCTCTGGGCA TACGAATACA TTAAATCCCA CTTATCTATA 
b 1260 

GAGCTTTCTT AAAGGGAAGA AATTTGAGTA GTATGTAAAA CAGAATAAAA GATTAAGGCT 
1320 

CCATAGGCAT ACAGCTTACC TCCAATTCTC TTGGCCTCTT GCAATTTCTA TTATCAGGCT 
1380 

to TTACAAGGTG ATTTGCCATC ATATTCCGAA GGCACCAGCT ACAAAGCTTA GAACAATGCC 

1440 

AGATTTAGGT ACAAACTCCA TGCTACAAGC TCTCTGGAAT CCTTCCCTGT TTCCCACTCC 
1500 

TACTGCTGAT GTTAATTTAG ACTGTCATTA TCTGTCACTT TCCTAAACTC AATTTCTCCC 
1560 

TCCTCTAAAT CATTCTATCA ACTGCTATTT GGGTAATCTT TCAAAACTTT GATTACTGCA 
1620 

TTCCTTTAAC TCAAAAACTT TCATTGTTCC AGAATAAGTT GAAATTCCAT GATATGGCCT 
1680 

TCAAGGTCCT GTATTATCTG GTGCAAGCCT ACTAGTCCCA TCATTTTCAA CTACTCCTCT 
1740 

CTATGTACTT AGCCAAATGA GTCTCTCTGG CAATTCTGCC TTGTTTCAGG ACTGGCTCAG 
1800 

TTAAGATTCT TTTATCTTCG GCCGGGCGCG CTGGCTCACG GCTGTAATCC CAGCACTTTG 
1360 

GGAAG CTG AG GCAGGAAGAT CACCTGAGGT CGGGAGTTCG AGACCAGCCT GGCCAGCATG 
25 192 0 

GTGAAACCCT GTGTCTACTA AAAATCCAAA C AT T AG CC AG GCGTGGTGGC AGGCGCCTGT 
1960 

AATCCCAGCT ACTTGGGAAG CTGAGGTGAG AGAATCGCTT GAACCCAGGA GAGGGAGGTT 
2040 

GCAGTGAGCC GAGATTGTGC CATTGCACTC CAGCCTGGGC AACAGAGCGA GACTCCACCT 
2100 

CAAAAAAAAA AAGGATTCTT CTATCTTCAC AAAATCTTAA TGTTTAAACA GGTCTTACAG 
2160 

TTCATCTAAT TCAATCTCAT TTTTTACAAG TGAGAAAACA GGGACAGTGA CGGTGGATCA 
2220 

3S AGTGACACCA GTAAGACTGA GCTAAATTAG AACCGAGATC TCACTCGAGT CTGAGGTTAT 

2280 

TCCCACTGTC CAACCTTACT TTAAAGTAGC TTCAAATTTT ACTTTTACTT TTCCATAAAT 
2340 

TCGGAAGGGA TTTTCCCTAG GAGTCCAAAT GTTGAAACCT GGAAGGGTAT AGTCTCTGTG 
2400 

40 TCTTTGAGAT GAGGGGAGCC CTGTCCATAT TCAAGTTATC AATTGACTTT GTTGTTTTTG 

2460 

AGAAACGATG CTGATTTGGG TAACTTTAAC ACATCTGTTT GATTAGTCCT AT AAAAT AT G 
2520 

CATATATAGA AGACAGAAAG AGCAACAACA AATTTGAAAG ATGCTTGTTA AGTAAATTCT 
2580 

GTATCGTACG TGTCCATTCC TGCCAGTACC TTTATAGTAT GTAAGTTTAC GTGCTGTAAT 
2640 

AGTATTAATA GTATCTAGAA AATACTACAC ATGCACAGCA GTGCTAACTT TGCCTTGGGA 
2700 

GTTGGAAAAT ACTTCAGAGA AGCCAACAGG CAGATTTTTC TCTCTTCCCT TCCCCTTCTA 
SO 2760 

ATTTTCCCTT TCCCCTTCAC CCCCTTCTCT TCTCTCCCCA AGTAACACTG TGCACCTATG 
2820 

TCAAACGAAA ACTTATAATC AAGTAACTGT TTCTGCAAAA ATAAGTTCGT TTTCCTGTCA 
2880 

TGGCTCAAGG CCTCAGCAGA TCCAGGCCTG GTGGACGGGC TGGTCTTCGT CGTGTGCCAA 
55 2940 
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ACACTGACCA CTGCCCTGGC TCTGCCATCT 
3000 

CTGTCCCCTC TGCCCCATGC AGCTGTCTCC 
3060 

CCTAGCCCCT CAGGCCATTT CACCTCCATT 
3120 

TCCTCCCTAC TGTTGTTTCC GCCCCACTAG 
3180 

CCCTTCCTTG TGTCACAGCC CGTCACATTC 
3240 

CAAGGCCAAT GTACTTCGCG GTATGGGGAC 
3300 

GACCCTGGGC GCGGGGTGCT CGGACTTCGG 
3360 

AGCAGCCCCT GCACGAGTCA CGTGACAGCT 
3420 

CCGTAGCCTC CCAGAGCCAG GCCCCACGGA 
3480 

CTTTCCCCTC CACTTGGAAT ACTCGTGAAA 
3540 

TTTGAACCAG GAAAAAATCT GAAACTGGTC 
3600 

AGGCCGGTGT GTGTCCCAGC AGCTTAGAAT 
3660 

ACGGCCTCTT TTCGAGTAAA ATTTACTTGG 
3720 

TTGCAGATGC TCTGTTTGCA GGAAGGCTTT 

3780 

GCTTTTAGAT CCAGAGCCTC AGTTACTGCC 
3840 

CAGAATCACG CCTTCTTAGA AAATTCTTAC 
3900 

GGCAACAGCT ATCAAAAAGT GTTGCATAAC 

3960 

TCATCGGGTG TTTTAATGCG GAGGCTTTGA 
4020 

TTACGCCCAG TTAGTGGATG TGGAAGAAAA 
4080 

TCAATAACAC ACATCCATAA GCTCCAGGTA 
4140. 

AACATTTACA TACATTACTA AGGTTTTTTT 
4200 

CATTAAGTCT TGGAATTATT CTGTGTGTGT 
4260 

TTGTTTTAAA TAAGTTCCTA GAAAATAAGC 
4320 

TTGTGAAATA TAGGCCACAT AATACTAGCC 
4380 

AACACTGTCC CAGAACAATA GCAATCTGTG 
4440 

GGGAAAAGGA ATGAAGTTTT AGTTTGCCTT 
4500 

AGCAGTATAA AGTTTTCATC AAGTCAAATA 
4560 

TGATACCATG TCCTTCCTAA TTTGGGGGGC 
4620 

ATTAAAAATT CTTACATTTT TAGTGTCCTT 
4680 

TTTATCTCCT CCTCCTTATT ATCATGGTTG 
4740 

CAAGGGTCTG GGAAATACTC ATGGAATTCA 



TAGGCTTAGT 


GACCTGGCTG 


TTACTAAGCA 


TTCTAGTCTT 


CTCCCTCTTC 


TCAACGCGAT 


TTCCCTCACT 


TCCCGCCGCC 


CCTCCGCACT 


AGCCCCTCAG 


AGAAAGTTTC 


CATCCTCGCA 


TCACAGGCGC 


CCATCCCTCC 


AGCCCCACCC 


CTTCCTCGTC 


AGCGAACGCG 


AGGGAGTGAA 


GGGTGGAGGT 


GGGAAGCGCG 


CCGCACTCCC 


CTCCCACCAC 


CACCCCCCCC 


AACTTCCCCA 


AAGGCAGCTT 


TTTCCCGGTT 


TTCTCCCGCT 


CAAAAATCTC 


TCCCTGCCAC 


CCTGTGTGTG 


AAGAAAGAAC 


AAGGAAGACT 


TGCCAAAGCA 


CTCAGCAAAG 


GAACACAAAA 


TAGCACATCC 


TTTGTTTGCA 


GGAAGGGTTT 


AAAACTGCGT 


AATCACGTGT 


TCCCCTGGCC 


CACAAGCAAG 


CCCTCTTCCT 


CTTTGGTGCA 


ACCAAACGTT 


CCCGGGTGTG 


TCAATAAGTT 


AAGTCTAATT 


ACACATGGCT 


CACATAATTG 


TAGCTTTGCC 


CCTGCAATTT 


CAAAGATATA 


CATTCCAAGC 


AAAAAAGCAA 


ATTACCTCAT 


AACACAAAGG 


CAAAATCTTA 


CATCTTAGAG 


AACTATATTT 


TTTCCTTTTG 


CTTGATTAAA 


TGTTAGTTAT 


ATATTTATTT 


GCTGTTTGTG 


AAGAAGCCGG 


GCTCAATGTG 


TTTAATCTGA 


GTTGCTAATA 


TAGATAACTA 


TGGCGAAGTA 


AGGAGTCTCA 


TTGAATTTTT 


ACCCTCTGTG 


GTAAAATGAA 


AATTTTTATC 


TTTATTGTTT 


CAGACTCTTC 


TATTCACTTT 


AAAGTGACTG 


TGCTTTATTC 


CAGGTGAGAT 

v»nuu x unvjn x 


AAGTTTT A Tf^ 

flow X X X X X VJ 


AAA T A A A A Aft 

nnxii/uinnnU 


CCTTGGTAAA 


ATGTAGAGTT 


GTCCACTGTG 


CTGTTATTAT 


TTTTAATGGT 


TCATTAAACC 


TGTCACAGCC 


TTCACACTGT 


ATGATATTTA 
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4800 

AACAGGTGGT TGTCCATCTG ATTCTTAAAA 
4860 

CATAAATGCT TTCATCAGAT TAAGAGAACA 
4920 

AAATATTAAC TTCCATTGCA TAAGCTAAAT 
4980 

CTAGAGCTTT AAAATATTCA AAGGGCTGTC 
5040 

CTCTATGAAG TACAAAGGAC ACTGAGACAT 
5100 

TGTCCCAAAA CTTCAAAATG TGTAAATTAC 
5160 

TCACTGGCAA TGGAATATTT AAAATTAGAG 
5220 

TTGATCCCAC TTCGTGCTTT CATGTTAATT 
5280 

AAAACTTACT ATTTCAACTT GAGTCACGTA 
5340 

CTATTTTTTT TCTTCTGATA GTCACCACAC 
5400 

TCCTTTGTAA TCACTGTTGA AGGACATGAT 
5460 

ATCTTGTTTT TAAAACAAAC AAACCAACAA 
5520 

CTGACTCTAG GAACCCCTCT GTTTTTATAT 
5580 

AATGCCACCT TGCTAATTCC CTTCCTAGCA 
5640 

AAAAATGGCT AAGAAATTTT GTTTCCACTA 
5700 

TGCTCTTCAA ACGTTACATT TTATAAGACT 
5760 

ATATATGTAT CCTTAAATTG TATTTCAAAT 
5820 

GAATAGAAAG TTTTAACACT GGAAACTGCA 
5880 

GGTAGCCCTC TCCCCAAGCT TACTTTCTGT 
5940 

TTCTAATGAT GGTATCCGTG TGGCTTGCAT 
6000 

AATGGACTCC TGTTCCACTG AAAAGTAAAT 
6060 

ACCATATTGA GCTTGGGAGG AAGGGGAAGT 
6120 

ATTTGCAAAA TGTTCCTTTT TTTAAAATGT 
6180 

CATTGTAGGA ATTACCCCAA TAGGACTGAT 
6240 

TGTGCTGAAG TGTGACCAGG AAGTCTGAAA 
6300 

CTTCTAATGG ACTAAGGAGG TGCTTTCTTA 
6360 

CAGGTTTTGG AAGGCACAGA GCCCCAACTT 
6420 

GATATTACAT TAAAAGAAGT ACTCGTATCC 
6480 

TAGGAAAGAG CCTGTTTGAA GGCGGGCCCA 
6540 

CCTTCTTCCT CATTCTCTCC CCAGCTTGCT 
6600 



TATTTCCAAG AAAAATGATT CCACCTAATf 
CCATGGACAT TTTATTTTAT TTTATTTTT 
GGGTAGGAAT AAGTGAGATG ATATTGTTA' 
AT CAT TAT CT CATTTAATCT TTGAAAACAi 
TTGTTGCTCT ATATCAAAGA AAAAAGTGT' 
ACATTCTGCA TCTTTACAGC TGGAGAAAA' 
CTTGCTTAGT GTGCTGCTTC TGATCACTAt 
GGCCCAATTG GACTCTACAG TTGGAAGGTC 
TGTATTCTTA TCATATACTT CTTAAAGGTi 
CAAGCACTTC CAGCCACCCT GCCACAGAC! 
GTTTTTATGA CTTCCCGAAA TGAAAACCC1 
AAAGTAGTGT TTATGTAAGC ATTTTGTTCC 
CAACTCTGTA CTGGCAAAAC ACAAAAACAJ 
AAGTAATACA GTTTAGCACA TGTTCAAGA? 
ATTATTTTCA AGACTGTGAT ATTTACACTC 
ATTTTTTAAC ATGTTGAACA TAAGCCCTA? 
ATTTTAGGTC AGTCTTTGCT ATCATTCCAC 
AGTAAATATT TGCCCTCTTA CCTGAATTT1 
TGCAGAAAGT GTAAAAATTA TTACATAAA? 
CTGATACAGC AGATAAAGAA GTTTTATGA/ 
CTTAATGGCC TGTATCAACT ATCCTTTGAC 
CCTGAATGAG GTTATAAAGT AAAAGAAAA1 
TACATTTTAG AAATATTTTA AGTGTTGTAP 
TATTCCGCAT TGTAAAATAA GAAAAAGTT1 
ATGAAGAGAG ACAGATGACA AAAGAAGATG 
AAGTCAGAAA GAGATACTCA GAAAGAGGTJ 
TTACGGAAGA AAAGATTTCA TGAAAATAGT 
TCTGCCACTT TATTTCGACT TCCATTGCCC 
AGGAGTGCCG ACAGCAGTCT CCTCCCTCC^ 
GAGCCCTTTG CTCCCCTGGC GACTGCCTGG 
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ACAGTCAGCA AGGAATTGTC TCCCAGTGCA TTTTGCCCTC CTGGCTGCCA ACTCTGGCTG 
6660 

CTAAAGCGGC TGCCACCTGC TGCAGTCTAC ACAGCTTCGG GAAGAGGAAA GGAACCTCAG 
5 6720 

ACCTTCCAGA TCGCTTCCTC TCGCAACAAA CTATTTGTCG CAGGTAAGAA AT AT C ATT CC 
6780 

TCTTTATTTG GAAAGTCAGC CATGGCAATT AGAGGTAAAT AAGCTAGAAA GCAATTGAGA 
6840 

GGAATATAAA CCATCTAGCA TCACTACGAT GAGCAGTCAG TATCAACATA AGAAATATAA 
10 6900 

GCAAAGTCAG AGTAGAATTT TTTTCTTTTA TCAGATATGG GAGAGTATCA CTTTAGAGGA 
6960 

GAGGTTCTCA AACTTTTTGC TCTCATGTTC CCTTTACACT AAGCACATCA CATGTTAGCA 
7020 

TAAGTAACAT TTTTAATTAA AAATAACTAT GTACTTTTTT AACAACAAAA AAAAGCATAA 
7080 

AGAGTGACAC TTTTTTATTT TTACAAGTGT TTTAACTGGT TTAATAGAAG C C AT ATAG AT 
7140 

CTGCTGGATT CTCATCTGCT TTGCATTCAG ACTACTGCAA TATTGCACAG AATGCAGCCT 
7200 

20 CTGGTAAACT CTGTTGTACA CTCATGAGAG AATGGGTGAA AAAGACAAAT T ACGT CT TAG 

7260 

AATTATTAGA AATAGCTTTC ACTTTAGGAA CTCCCTGAGA ATTGCTGCTT TAGAGTGGTA 
7320 

AGATAAATAA GCTTCTCTTT AAACGGAATC TCAAGACAGA ATCAGTTACA TTAAAAGCAA 
7380 

25 ACAAAAAATT TGCCCATGGT TAGTCATCTT GTGAAATCTG CCACACCTTT GGACTGGGCT 

7440 

ACAATTGGAT AATATAGCAT TCCCCGAGAT AATTTTCTCT CACAATTAAG GAAAGGGCTG 
7500 

AATAAATATC TCTGTTTGAA GTTGAATAAC AAAAATTAGG ACCCCCTAAA TTTTAGGGCT 
30 7560 

CCTGAAATTC GTCTTTTTGC CTATATTCAG CTACTTTACG TTCTATTAAA TCTTCTTTCA 

/ £ 2 C 

GGCCAGGTGC ACTAGCTCAT GCCTAGAATC TCAGGCAGGC CTGAGCCCAG GAATTTGAGA 
7680 

CCAGCCAGGG CAACACAGTC TCTACAAAAA AATAAAAAAT TACCTGGGTG TGTTGGTGCA 
35 774 0 

TGCCTGTAGA ACTACTCAGG ATGCTGAGGA CTGCTTGAGC CCAGGATAGC CAAATCTGTG 
7800 

GTGAGTTCAG CCACTAAACA GAGCGAGACT TTCTCAAAAA AACAAACAAA AAAACAAACA 
7860 

40 AACTTCCTTC AAAATAACTT TTTATCTGCA ATGTTTTCCT ATTGCCTGTG AGATTAAATT 

7920 

TACTCTTTTA CCTGATTTCC AAAGCCCTCC ATAATCTAAT CCGACTTTAC CTTGTGTTCA 
7980 

CTGCAAAATA GCAGGACTGT TCCACTACAA TCCAAAAATC ACAGGTTGGG TGCAGTGGCT 
8040 

45 CACTCCTGTA ATCCCAACAC TTTGGAAGGC CAAGGCAGGT GGATTGCTTC AGCT CAGGAG 

8100 

TTCAAGACCA GCCTGGGCAA CATGGCAAAA ACCCTGTCTC TCCAAAACAT ACAAAAATTA 
8160 

GCCAGATGTG GTAGTATGTG CCTGTAGTCC CAACTACTCA AAAGGCTAAG G C AAGAGG AT 
8220 

50 CACTTGAGCC CAGGAGGTCA AGGCTACAGT GAGCCATGTT TACTGTGTCA CTGCACTCCA 

8280 

GCCTGGGTGA TAGAGCAAGA CCATGTCTCA AAAAAAAAAA AAAGAAAAGA AAAGAAAAAA 
8340 

ACATCGCTCT ATTCAGTTCA CCCCCACCAC AAGATTGTTT TGATTATCAC ATAAATGCTG 
55 8400 

GTCCATTGCC TTCTCTATCT ATTCAAATCT TTAAGCATTC TTTGAGATTC AACTCAATTC 
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8460 

TCCTTTTCAA ACTAGGCCAT TTAAACTACA TCAGTTCCAT TTTGATTTTC TTGCTTTGA 
8520 

TCTACAGACT CAAAAACAAA AACTTAAAAA CTTATTTTTT AAGTTTTCTG CTACTCTCA 
8580 

TTCTTCAACA CTCACATACA CGCATTCATA ATAAGATGGC AGAATGTTCA AGGATAAAA 
8640 

GATTTATAGA ACTGAAAAGT TAGGTTTTGA TCTTGTTGCT GTCAAGATGA CTACCTACC 
8700 

GATCTCAGGT AATTAATTAT GTAGCATGCT CCCTCATTTC ATCCCATACC TATTCAACA' 
8760 

GATTGGAATT CCACAGCAAG GATAAACATA ATCATAGTTG CTTTTCAAGT TCAAGGCAT' 
8820 

TTAACTTTTA ATCTAGTAGT ATGTTTGTTG TTGTTGTTGT TGTTTGAGAT GGAGCCCTG 1 
8880 

15 TGTGTCACCC AGGCTGGAGT GCAGTGGCAC GAACTCGGCT CACTGCAACC TCTGCCTCA' 

8940 

GGGTTCAATC AGTTATTCTG CCTCAGTGTC CCAAGTAGCT GGGACTACAA GGCACATGO 
9000 

ACCATGCCTG GCTAATTTTT GTATTTTTAG TAGAAACAGG GCTTCACCAT GTTGGCCAG( 
9060 

CTGGTCTCGA ACTCCTGACC TCAAGTGATC CAGCCGCCTC GGCCTCCCAA AGTGCTGGGJ 
9120 

TTACAGGCAT AAGCCACCGT GCCCAGCCTA ATAGTATGTT TTTAAACTCT TAGTGGCTTJ 
9180 

ACAATGCTGG TTGTATAATA AATATGCCAT AAATATTTAC TGTCTTAGAA TTATGAAGAJ 
25 924 0 

GTGGTTACTA GGCCGTTTGC CACATATCAA TGGTTCTCTC CTTACAGCTT TAATTAGAG r 
9300 

CTAGAATTGC AGGTTGGTAG AGCTGGAACA GACCTTAAAG ATTGACTAGC CAACTTCCT 
9360 

GTCCAAATGA GGGAACTGAG ACCCTTAAAA TTAAGTGACT TGCCCCAGAC AAAACTGGA; 
30 9 4 2 0 

CTCATGTGTC CTAATTTCCA TCATGAAATT CTACCATTCA CTAGCCTCTG GCTAGTTGTC 
9480 

AAAGTATTGC ATAACTAAAT TTTTATGTCT GTTTTAAAGA ACAAATTGTC ACTGCTTACT 
9540 

CCTGGGAGGG TCTTTCTGAG GTGGTTTATA ACTCTTAAAA AAAAAAAAGT CAGTAGTCTC 
9600 

AGAATTTTAG ACGAAATAGT CAAAGCATTT TTATCCAATG GATCTATAAT TTTCATAGAT 
9660 

TAGAGTTAAA TCAAAGAAAC ACGGATGAGA AAGGAAGAGG AAAATTGAGG AGAGGAGGAJ 
9720 

40 TGGGGATGAG AACACACTAC TTGTAATCAG TCATAGATGT ACTGAGAACT AACAAGAAG/ 

9780 

ATTGTAAGAA AATAAGAATG AAGAATTCAA AATCAACACA TGAAATAAAA AGAAACTAC1 
9840 

AGGGAAAAAT GGAGAAGACA TTAGAAAAAT TATTCTATTT TTAAAATTCT GTTTTCAGGC 
9900 

45 TTCCCTCCTG TTCTTCCTCC TTCTCATTGG TTTTCAGGTG GAGGGAAAGT TTAAGATGG* 

9960 

AAAAATATAT ATATTCTACA CATCCCTTTC TACGCTGTTG TCATGGCAAC AAGGTTTATC 
10020 

ATAGCAAACT TTTATTCATA CAACATTTAT TGAGTTCTTA CTGTGTGGTA AGCTCTTTCC 
10080 

AGGTGTTGAA AATTCAGGGG AAAAAAGACA ACTCATTGTC TTAAAACTCA GATGAAAGC1 
10140 

GAACAGACCT ATTTTTAATC AAAGTAATCT CAATTTAGGG TAGTAAGAGC TATTTAAGAfl 
10200 

GCATGAACAG GTGTGAAGGA GGTAGGACTC TGAGGAGAGA AT AG TT AG CT AGGAATGAAA 
55 10260 
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GAGCAGAGAA GTTTTCCTAG AGGAACTATT AAAGCTGGGA GTTACGGGAT GAAAGATGAG 
10320 

GCAGGGTTTG CAGGCAAAAA AAAAAAAAAG GCAGGGGAAG GGGAAGTTCT GGCCTGGCAG 
10380 

AGAGAATAAC TGTGGCAACA ATGGAGGAGA GTCTGGAAGC AAGAAAACCA AGTAGAAGAG 
10440 

TATTAAAATA GAAGATGCCA GGGGTAATGA GGGCTTG ATT TAAAACAGTG CTGTTGGAGA 
10500 

TGGAGAGGAG ATACCAAATT CTGGAGACAT TTCTGAGTTA GAACCTACAG TATTTATCAG 
10560 

ACAAGGGAAA GATTAGACAA AGGAGTTAAG AATGACTCCC AGGTTTCAGT TTGGGGCAGG 
10620 

TAACTAGGAC ATGTTTTGAA AAGTAATGTA TTGGATCTCT TACCATTGGA ACTATGTATG 
10680 

TGGAGCCAAA TTAAAATTTG TACATGTATA TAACTCTCCC CCCACCACCA GTAACTACTT 
10740 

CCCTAACTCT CTACTTTGTA GCCAGACTTC CTAAAAGAAT AGTTTGTAGT CACTGTCTTT 
10800 

ACTTTTCCCC TCCCATTCTG TCCTAGATAT TTGTCCACCT ACCATCTGCT GCCTCCACTT 
10860 

TACCCAAACT GTTCTACGGT TGCCCAAAAC TTCCTAATTG CCAAATTCAA TGAACAAGTT 
10920 

TAAGCTTATA TGTAAATTAG GAGCTCTACA GTTTGATTTC GAGCAGCCCC TCCTGAAACC 
10980 

CTTTCTCTTT CGACTTCTGT GACACATCTC AGATTTACAA AACTGAACTA ATTATTTTAC 
11040 

ACTTGAGCTG TATTTTCGTT CTTCTTTCTT GATGAATGAG GTAACCACTC AACAAATTGC 
11100 

CCAAGCCAAA AACTACGAAG TCATCCTCAG TTCCTCCTTC TTCTGTTTGA CCCACAACAG 
11160 

ATCAGCTGAG AAATCCCGCT GTTTAGTATC TCTTGAATTC ATTACCTTAA TTTATAGCCT 
11220 

CATCAACTCT TAATTGTTAA AATTACTTCA GTAGTTGTTG TCTGACCTCT GTCCAATCTT 

* ■* -\ r> rt 

GTTCAATCAG GTCCATTCTT TTGTTCTTGG TGGTGGTGGT GGTGTTGACA GAGTTTCGCT 
11340 

TTTGCTGCCC AGGCTGAAGT GCAGTGGAGC ACTTCACTGC AACCACAGCC TCCTGGGTTT 
11400 

AAGCAGTTCA CCCTCCCGAG TAGCTGGGAC TACAGGTATG TGCCACCACA CCCAGCTAAT 
11460 

TTTGTGTTTT CAGTAGAGAC AGGGTTTCAC CATGTTGGTC AGGCTGGTCT CAAACTCCTG 
11520 

ACCTCAAGCA ATCCACCCAC CTCAGCCTCC CAAAGTGCTG GGATTACAGG CATGAGCCAC 
11580 

TGCACACGGA CCAGATCCAT TGTTTATGTT GCTTCTAGAG TGAGTTTTTA AAACACAAAT 
11640 

TTGACCATAT CTTTCTCCAA TTTAAGTCAG TATTTTTTTT TTCAGGAAAA AACAGTTCAA 
11700 

ACTCTTTAGT CTGCTTACAC AAGGCCTTTG TAGTCTGACT CTTCTTTCCA AGCTTTCATC 
11760 

AAAGTATACT GCAAGTTACA TTTTATGTGA ATTGAATTAG GCAACGGTAT AAAAATTATA 
11820 

GTTTATATGG GCAAAATGGA AATAATGTTA ACTCTTCCAA ATAGTTTATC TAGAATGACA 
11880 

TAATTTCAAA GCTGTCAGGT CAAATGAGTT ATAAACTGTT AACACTATTG CCACATGCAA 
11940 

GTGTCTCTTA TACTTGGTAG AATTATCTGC TTCCATGTCA TTATTATGTA AATTAGACTT 
12000 

TAAATAACTC AGAAGTTCTT CAGACATACA GGTTATTATT GTGCTTTTTA AACATAATTT 
12060 

TAAATAATTT TATATATGAT AATGTTATCC AAGTGCTAAG GGATGTATTG TTACTGCTGT 
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12120 

GCAAAAAAAA AAAAAAAAAA AACTCCAAAT AAATATGTTG AAACCAAGTT TATATGCAA 
12180 

AAAACAATAT TAAAAAGGCC AAAGTACCAC CATAATAGGC TGTGTGGAGA CGGCAGGCT 
12240 

CAAAACACTA GTAATAATGC TGAGAAAGTT GAAAAAAGAA AGAAAGCAAC AATATGC^T 
12300 

GGTTGTTGTA GGTTTATGTA CTCCAAGAAT ATCTCCTCTC AAACTTTTAC G^TTTTTCC 
12360 

AAGAAAAGTT AACTTTGGCT GGGCGCAGTG GCTCTTGCCT GTAGTCCCAG CCTTTGGGA 
12420 

GCCAAGGCGG GCAGATCACC TGAGGTCAGG AGTTTGAGAC CAGCCTGACC AAAAATGGA 
12430 

AAACCCGCCC CCCTCACTAC TAAAAGAATA CAAAATTAGG CCGGGCACAG TGGCTTACC 
12540 

1* CTGTGATCCC AGCACTTTGG GAGGCCGAAG CAGGAAGATC ACCTGAGGTC AGGAGTTCG 

12600 

GACCAGCCAT GGAGAAACCC GTCTCTACTA AAAATACAAA ATTAGCCGGG CGTGGTGGT 
12660 

CATGACTGTA ATCCCAGCTA CTCAGGAGGC TAAGGCAGAG AATCACTTGA ACCCAGGCA 
12720 

TGGAGGTTGC AGTGAGCCGA GATCGTGCCA TTGCACTCCA GCCTGGGCAA CAAGAGCGA 
12780 

ACTCT3TATC CAAAAAACAA AAGAAAAGAA AAGGTAACCT TGAACTATGT GAGATCTTT 
12840 

GAAATGCATT CTTTCTGTAA AATGTGACTA CATTTGCCTT ATTTATGGTA AAAATGTTG 
25 12900 

GGCCTCAAAC AACCCATATT TTCTCGGTCT CCCCGCTGCC TAGCCTTTGT TCACATTGC 
12960 

TCTTCTTGGT GGAAGCTCTT CCTCTGGCCT TGAAAATGCC TGCTTCTCTT TCAAGGTAG< 

13020 

ACAGTCATCA CTTTCTGTGG TAACCTTCTC CAGCACCATC AAA C AG AAAG AATGAATCT< 
30 13080 

TTGTAAATTC AGCTCTTACG TCATTCATTA CATTATTTTG TAACTCTTTA TAGATTCTT< 
13140 

TCTCCCACTA GACTCTGAGT CACTGGAGAG TAGGAGCCAA CTCTCATTCA TGTGTGGTT. 
13200 

GGTCAGCTAC TGGCCACATT CCTGATGCAT AGTTAATGCT CAAACCTTAA CTGGTGAAT< 
13260 

AGCTCAAATA TTGTCCTTCT CTAAATCCAT TCACTCATTG ACTAACTATG TACTCAAAA r 

13320 

AGTAAACACC AGTAATTTAA TCCAATTCCT GCCCATACTG CTTGGTACAT TTCAGGTGAi 
13380 

40 TTAGTTTGAT AAATATGTGT GTATTACATA ATATTAAAGT ATGTACAGAA GATCATGCTi 

13440 

ATCATAATTC ACAACTGATA ACTAATCAAA CATAAATGCT CTCAGGTTAA CAAATGTCTC 
13500 

CCTTCTCAGT TAATGCAGTC ATTAACAAAC ACCTTCTGAT GCTGATAATA GGGCCTTGT'! 
13560 

45 CAGCAATGAA GCCATAAAGG TGAATAAAGA ACATGCCCTC GTGGAGCTCA CAGCCTAGTC 

13620 

ATTATTGTTC TGATTTTTAA TATTAATGTT GGTTTGGGTT TTGGTGAAAA ATGTTTAGAC 
13680 

TTATCTTAGT GATCTTTTCA TCCTTTGCTA TATTATTTTT CTCTAAGAGT CTTCCTTATC 
13740 

CCCTCCTTTA AAAAACTAGG TGATAATTCT AAATTGTAAA TTTAAATATT ATAAATAGO 
13800 

TATAAAATTT AATATTTATA ATATTTAAAT GTTTGATAAA TATTTAAATT TTATAATATT 
13860 

TAAATGTTTA TTTAAATTCA TTTGTACATC AGTTTTTATT TTATTTAAAT GTGTTGGCG? 
55 13920 
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GGCATGGTGG CTGACACCTA TAATCCCAGA ACTTTGAGAG GCCAAGTCAG GCAAACCATT 
13980 

TGAGCTCAGG AGTTTGAGAC CACCCTGGGC AACGTGGTGA AACCCTGTCT CTACCAAACA 
14040 

TATGAAAACT TATCTGGGTG TGGTGGCACG CATCTGTGGT CCCAGATGGG AGTCCCAGGC 
14100 

TAAGATGGGA GAATCGCTTG AACCCAGGTG AGAGGGGTGG GGTGGATGTT GCAGTGAGCT 
14160 

GAGATCGTGC CACTGCACTC CAACCTGGGT GACAGAGTGA GACTCCATCT CAAAAAAAAA 
14220 

AAATGTTATC TAAATAAGAT AAATTTAATA ACTGTTCGCA CTTAGATGAG CATAAGGAAC 
14280 

TAAACCTAGA TAAAACTATC AAATAAGGCC TGGGTACAGT GACTCATGCC TGTAATCTCA 
14340 

AGCACTTTGG GAGGCCAAAA TTATACAAAG TTAGTTGTAT AACACCAACT AACAACTATT 
14400 

TTGGGGTTAG CTTAATTCAG ATTAATTTTT TTTAAACTGA GTTTTAAATT CCTGCTTACT 
14460 

CTACCATACA TGCTAGGCCT CATATTATGC TAGAAAAATT TTGAGCACAG ATTTATGAAT 
14520 

ACTCTCCTGC ATACCATTTA ATTTTTAAAC AAATTTTAAT GCAGTATATA TGTGCCTTTT 
14580 

TACCAACACA TTAAATAATA AGATCTACTG TGAGGACTAA ATTTCTGTAA TTTCAAAGTA 
14640 

GTAATGAGTT TAAACCATGT CTCAAGATCT CTGCAATAAC TGTAGCACAA CAGAAAATAG 
14700 

GTATTTCTAT TAATGACAGA GTCACAAGTA CTACTAATAA TACTGTGGTT TGTTTCCTGC 
14760 

AACTAATCAT GGGAGGAATG CTAAATTTCA GAGGTTGGTG AAAATACATG TGTATTTTTT 
14820 

TCCCCATCCA AGTTCACAGA TTTCTCACAC TGAGAACTCC TATTCCATAA CAAAATTCTG 
14880 

GAAGCCTGCA CACCGTATTG GAAGAAGGGC AGAAAGGAAA AGCAAATGGA AGGATTTAAA 

14940 

TTTTTTTCAA ATCCTGTATC CCTTGATTTT ACAGCAAGAT TGTATTTATG TATTACTTGT 
15000 

GTTAAAAATA TAGTATAATC GAGACTCCAG ATCAAAAATC ACCGCAGCTC AGGGAGAAAG 
15060 

AGGGCCACCA AATGCCAGAG CCCTTCAGCC TTCTCCCACC CTGCCTGTAC CCTCAGATGG 
15120 

AAGCACTTTT TTATCATTGT TTCACCTTTA GCATTTTGAC AATGAAGTCA CAAACCTTCA 
15180 

GCCTCTCACC CATAGGAACC CACTGGTTGT AAGAGAAGGA TGAAGCCAGT CCTTCCTAAA 
15240 

GGGCACGATT AGATGTGTTT ATGGCATCCT CAGGTGAAAC TATATTTATA TTGACAATAT 
15300 

ATTTATATTT CTCAAGGAAT ACTAGAATAA TGATTCAGTT CAGTACTAGG CCATTTATCT 
15360 

ACCCTTTATA ATATTGTTTA ATGAGAAAAT GCTTTCTATC TTCCAAATAT CTGATGATTT 
15420 

GTAAGAGAAC ACTTAAACAT GGGTATTCAT AAGCTGAAAC TTCTGGCATT TATTGAATGT 
15480 

CAAGATTGTT CATCAGTATA CTAGGTGATT AACTGACCAC TGAACTTGAA GGTAGTATAA 
15540 

AGTAGTAGTA AAAGGTACAA TCATTGTCTC TTAACAGATG GCTCTTTGCT TTCATTAGGA 
15600 

ATAAAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC AAC TTT GTG GCA 
15651 

Met Ala Ala Glu Pro Val Glu Asp Asn Cys He Asn Phe Val Ala 
-35 -30 -25 

ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA G GTAAGGC TAATGCCATA 
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15702 

Met Lys Phe lie Asp Asn Thr Leu Tyr Phe lie Ala 
-20 -15 ' -10 

GAACAAATAC CAGGTTCAGA TAAATCTATT CAATTAGAAA AGATGTTGTG AGGTGAACTA 
5 15762 

TTAAGTGACT CTTTGTGTCA CCAAATTTCA CTGTAATATT AATGGCTCTT AAAAAAATAG 
15822 

TGGACCTCTA GAAATTAACC ACAACATGTC CAAGGTCTCA GCACCTTGTC ACACCACGTG 
15882 

w TCCTGGCACT TTAATCAGCA GTAGCTCACT CTCCAGTTGG CAGTAAGTGC ACATCATGAA 

15942 

AATCCCAGTT TTCATGGGAA AATCCCAGTT TTCATTGGAT TTCCATGGGA AAAATCCCAG 
16002 

TACAAAACTG GGTGCATTCA GGAAATACAA TTTCCCAAAG CAAATTGGCA AATTATGTAA 
16062 

15 GAGATTCTCT AAATTTAGAG TTCCGTGAAT TACACCATTT TATGTAAATA TGTTTGACAA 

16122 

GTAAAAATTG ATTCTTTTTT TTTTTTTCTG TTGCCCAGGC TGGAGTGCAG TGGCACAA^C 
16182 

TCTGCTCACT GCAACCTCCA CCTCCTGGGT TCAAGCAATT CTCCTGCCTC AGCCTTCTGA 
16242 

GTAGCTGGGA CTACAGGTGC ATCCCGCCAT GCCTGGCTAA TTTTTGGGTA TTTTTACTAG 
16302 

AGACAGGGTT TTGGCATGTT GTCCAGGCTG GTCTTGGACT CCTGATCTCA GATGATCCTC 
16362 

CTGGCTCGGG CTCCCAAAGT GCTGGGATTA CAGGCATGAA CCACCACACA TGGCCTAAAA 
25 16422 

ATTGATTCTT ATGATTAATC TCCTGTGAAC AATTTGGCTT CATTTGAAAG TTTGCCTTCA 
16432 

TTTGAAACCT TCATTTAAAA GCCTGAGCAA CAAAGTGAGA CCCCATCTCT ACAAAAAACT 
16542 

GCAAAATATC CTGTGGACAC CTCCTACCTT CTGTGGAGGC TGAAGCAGGA GGATCACTTG 

166^2 

AGCCTAGGAA TTTGAGCCTG CAGTGAGCTA TGATCCCACC CCTACACTCC AGCCTGCATG 
L6662 

ACAGTAGACC CTGACACACA CACACAAAAA AAAACCTTCA TAAAAAATTA TTAGTTGACT 

16722 

35 TTTCTTAGGT GACTTTCCGT TTAAGCAATA AATTTAAAAG TAAAATCTCT AATTTTAGAA 

16782 

AATTTATTTT TAGTTACATA TTGAAATTTT TAAACCCTAG GTTTAAGTTT TATGTCTAAA 

16842 

TTACCTGAGA ACACACTAAG TCTGATAAGC TTCATTTTAT GGGCCTTTTG GATGATTATA 

16902 

40 TAATATTCTG ATGAAAGCCA AGACAGACCC TTAAACCATA AAAATAGGAG TTCGAGAAAG 

16962 

AGGAGTAGCA AAAGTAAAAG CTAGAATGAG ATTGAATTCT GAGTCGAAAT ACAAAATTTT 
17022 

ACATATTCTG TTTCTCTCTT TTTCCCCCTC TTAG CT GAA GAT GAT G GTAAAGT 
17075 

Ala Glu Asp Asp Glu 
-10 

AGAAATGAAT TTATTTTTCT TTGCAAACTA AGTATCTGCT TGAGACACAT CTATCTCACC 
17135 

ATTGTCAGCT GAGGAAAAAA AAAAATGGTT CTCATGCTAC CAATCTGCCT TCAAAGAAAT 
so 17195 

GTGGACTCAG TAGCACAGCT TTGGAATGAA GATGATCATA AGAGATACAA AGAAGAACCT 
17255 

CTAGCAAAAG ATGCTTCTCT ATGCCTTAAA AAATTCTCCA GCTCTTAGAA TCTACAAAAT 
17315 

AGACTTTGCC TGTTTCATTG GTCCTAAGAT TAGCATGAAG CCATGGATTC TGTTGTAGGG 
55 17375 
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GGAGCGTTGC ATAGGAAAAA GGGATTGAAG CATTAGAATT GTCCAAAATC AGTAACACCT 
17435 

CCTCTCAGAA ATGCTTTGGG AAGAAGCCTG GAAGGTTCCG GGTTGGTGGT GGGGTGGGGC 
17495 

AGAAAATTCT GGAAGTAGAG GAGATAGGAA TGGGTGGGGC AAGAAGACCA CATTCAGAGG 
17555 

CCAAAAGCTG AAAGAAACCA TGGCATTTAT GATGAATTCA GGGTAATTCA GAATGGAAGT 
17615 

AGAGTAGGAG TAGGAGACTG GTGAGAGGAG CTAGAGTGAT AAACAGGGTG TAGAGCAAGA 
17675 

CGTTCTCTCA CCCCAAGATG TGAAATTTGG ACTTTATCTT GGAGATAATA GGGTTAATTA 
17735 

AGCACAATAT GTATTAGCTA GGGTAAAGAT TAGTTTGTTG TAACAAAGAC ATCCAAAGAT 
17795 

ACAGTAGCTG AATAAGATAG AGAATTTTTC TCTCAAAGAA AGTCTAAGTA GGCAGCTCAG 
17855 

AAGTAGTATG GCTGGAAGCA ACCTGATGAT ATTGGGACCC CCAACCTTCT TCAGTCTTGT 
17915 

ACCCATCATC CCCTAGTTGT TGATCTCACT CACATAGTTG AAAATCATCA TACTTCCTGG 
17975 

GTTCATATCC CAGTTATCAA GAAAGGGTCA AGAGAAGTCA GGCTCATTCC TTTCAAAGAC 
18035 

TCTAATTGGA AGTTAAACAC ATCAATCCCC CTCATATTCC ATTGACTAGA ATTTAATCAC 
18095 

ATGGCCACAC CAAGTGCAAG GAAATCTGGA AAATATAATC TTTATTCCAG GTAGCCATAT 
18155 

GACTCTTTAA AATTCAGAAA TAATATATTT TTAAAATATC ATTCTGGCTT TGGTATAAAG 
18215 

AATTGATGGT GTGGGGTGAG GAGGCCAAAA TTAAGGGTTG AGAGCCTATT ATTTTAGTTA 
18275 

TTACAAGAAA TGATGGTGTC ATGAATTAAG GTAGACATAG GGG AGTGCTG ATGAGGAGCT 
18335 

GTGAATGGAT TTTAGAAACA CTTGAGAGAA TCAATAGGAC ATGATTTAGG GTTGGATTTG 

183 95 

GAAAGGAGAA GAAAGTAGAA AAGATGATGC CTACATTTTT CACTTAGGCA ATTTGTACCA 
19455 

TTCAGTGAAA TAGGGAACAC AGGAGGAAGA GCAGGTTTTG GTGTATACAA AGAGGAGGAT 
18515 

GGATGACGCA TTTCGTTTTG GATCTGAGAT GTCTGTGGAA CGTCCTAGTG GAGATGTCCA 
18575 

CAAACTCTTC TACATGTGGT TCTGAGTTCA GGACACAGAT TTGGGCTGGA GATAGAGATA 
18635 

TTGTAGGCTT ATACATAGAA ATGGCATTTG AATCTATAGA GATAAAAAGA CACATCAGAG 
18695 

GAAATGTGTA AAGTGAGAGA GGAAAAGCCA AGTACTGTGC TGGGGGGAAT ACCTACATTT 
18755 

AAAGGATGCA GTAGAAAGAA GCTAATAAAC AACAGAGAGC AGACTAACCA AAAGGGGAGA 
18815 

AGAAAAACCA AGAGAATTCC ACCGACTCCC AGGAGAGCAT TTCAAGATTG AGGGGATAGG 
18875 

TGTTGTGTTG AATTTTGCAG CCTTGAGAAT CAAGGGCCAG AACACAGCTT TTAGATTTAG 
18935 

CAACAAGGAG TTTGGTGATC TCAGTGAAAG CAGCTTGATG GTGAAATGGA GGCAGAGGCA 
18995 

GATTGCAATG AGTGAAACAG TGAATGGGAA GTGAAGAAAT GATACAGATA ATTCTTGCTA 
19055 

AAAGCTTGGC TGTTAAAAGG AGGAGAGAAA CAAGACTAGC TGCAAAGTGA GATTGGGTTG 
1911S 

ATGGAGCAGT TTTAAATCTC AAAATAAAGA GCTTTGTGCT TTTTTGATTA TGAAAATAAT 
19175 

GTGTTAATTG TAACTAATTG AGGCAATGAA AAAAGATAAT AATATGAAAG ATAAAAATAT 
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19235 

AAAAACCACC CAGAAATAAT GATAGCTACC ATTTTGATAC AATATTTCTA CACTCCTTTC 
19295 

TATGTATATA TACAGACACA GAAATGCTTA TATTTTTATT AAAAGGGATT GTACTATACC 
19355 

TAAGCTGCTT TTTCTAGTTA GTGATATATA TGGACATCTC TCCATGGCAA CGAGTAAT~G 
19415 

CA3TTATATT AAGTTCATGA TATTTCACAA TAAGGGCATA TCTTTGCCCT TTTTATTTAA 
19475 

TCAATTCTTA ATTGGTGAAT GTTTGTTTCC AGTTTGTTGT TGTTATTAAC AATGTTCCCA 
1953 5 

TAAGCATTCC TGTACACCAA TGTTCACACA TTTGTCTGAT TTTTTCTTCA GGATAAAACC 
19595 

CAGGAGGTAG AATTGCTGGG TTGATAGAAG AGAAAGGATG ATTGCCAAAT TAAAGCTTCA 
19655 

GTAGAGGGTA CATGCCGAGC ACAAATGGGA TCAGCCCTAG AT AC C AG AAA TGGCACTTTC 
19715 

TCATTTCCCC TTGGGACAAA AGGGAGAGAG GCAATAACTG TGCTGCCAGA GTTAAATTTG 
19775 

TACGTGGAGT AGCAGGAAAT CATTTGCTGA AAATGAAAAC AGAGATGATG TTGTAGAGGT 
19835 

CCTGAAGAGA GCAAAGAAAA TTTGAAATTG CGGCTATCAG CTATGGAAGA GAGTGCTGAA 
19895 

CTGGAAAACA AAAGAAGTAT TGACAATTGG TATGCTTGTA ATGGCACCGA TTTGAACGCT 
19955 

TGTGCCATTG TTCACCAGCA GCACTCAGCA GCCAAGTTTG GAGTTTTGTA GCAGAAAGAC 
20015 

AAATAAGTTA GGGATTTAAT ATCCTGGCCA AATGGTAGAC AAAATGAACT CTGAGATCCA 
20075 

GCTGCACAGG GAAGGAAGGG AAGACGGGAA GAGGTTAGAT AGGAAATACA AGAGTCAGGA 
20135 

GACTGGAAGA TGTTGTGATA TTTAAGAACA CATAGAGTTG GAGTAAAAGT GTAAGAAAAC 

20195 

TAGAAGGGTA AGAGACCGGT CAGAAAGTAG GCTATTTGAA GTTAACACTT CAGAGGCAGA 
20255 

GTAGTTCTGA ATGGTAACAA GAAATTGAGT GTGCCTTTGA GAGTAGGTTA AAAAACAATA 
20315 

GGCAACTTTA TTGTAGCTAC TTCTGGAACA GAAGATTGTC ATTAATAGTT T T AG AAAACT 
20375 

AAAATATATA GCATACTTAT TTGTCAATTA ACAAAGAAAC TATGTATTTT TAAATGAGAT 
20435 

TTAATGTTTA TTGTAG AA AAC CTG GAA TCA GAT TAG TTT GGC AAG CTT GAA 
20486 

Glu Asn Leu Glu Ser Asp Tyr Phe Gly Lys Leu Glu 
-5 15 
TCT AAA TTA TCA GTC ATA AGA AAT TTG AAT GAC CAA GTT CTC TTC ATT 
20534 

Ser Lys Leu Ser Val He Arg Asn Leu Asn Asp Gin Val Leu Phe He 

10 15 ~ 20 

GAC CAA GGA AAT CGG CCT CTA TTT GAA GAT ATG ACT GAT TCT GAC TGT 
20582 

Asp Gin Gly Asn Arg Pro Leu Phe Glu Asp Met Thr Asp Ser Asp Cys 

25 30 35 

AGA G GT ATTTTTTTTA ATTCGCAAAC ATAGAAATGA CTAGCTACTT CTTCCCATTC 
20638 
Arg Asp 
40 

TGTTTTACTG CTTACATTGT TCCGTGCTAG TCCCAATCCT CAGATGAAAA GTCACAGGAG 
20698 

TGACAATAAT TTCACTTACA GGAAACTTTA TAAGGCATCC ACGTTTTTTA GTTGGGGTAA 
20758 
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AAAATTGGAT ACAATAAGAC ATTGCTAGGG GTCATGCCTC TCTGAGCCTG CCTTTGAATC 
2 0818 

ACCAATCCCT TTATTGTGAT TGCATTAACT GTTTAAAACC TCTATAGTTG GATGCTTAAT 
20878 

CCCTGCTTGT TACAGCTGAA AATGCTGATA GTTTACCAGG TGTGGTGGCA TCTATCTGTA 
20938 

ATCCTAGCTA CTTGGGAGGC TCAAGCAGGA GGATTGCTTG AGGCCAGGAC TTTGAGGCTG 
20998 

TAGTACACTG TGATCGTACC TGTGAATAGC CACTGCACTC CAGCCTGGGT GATATACAGA 
21058 

CCTTGTCTCT AAAATTAAAA AAAAAAAAAA AAAAAACCTT AGGAAAGGAA ATTGATCAAG 
21118 

TCTACTGTGC CTTCCAAAAC ATGAATTCCA AATATCAAAG TTAGGCTGAG TTGAAGCAGT 
21178 

GAATGTGCAT TCTTTAAAAA TACTGAATAC TTACCTTAAC ATATATTTTA AAT AT TTT AT 
21238 

TTAGCATTTA AAAGTTAAAA ACAATCTTTT AGAATTCATA TCTTTAAAAT ACTCAAAAAA 
21298 

GTTGCAGCGT GTGTGTTGTA ATACACATTA AACTGTGGGG TTGTTTGTTT GTTTGAGATG 
21358 

CAGTTTCACT CTGTCACCCA GGCTGAAGTG CAGTGCAGTG CAGTGGTGTG ATCTCGGCTC 
21418 

ACTACAACCT CCACCTCCCA CGTTCAAGCG ATTCTCATGC CTCAGTCTCC CGAGTAGGTG 
21478 

GGATTACAGG CATGCACCAC TTACACCCGG CTAATTTTTG TATTTTTAGT AGAGCTGGGG 
21538 

TTTCACCATG TTGGCCAGGC TGGTCTCAAA CCCCTAACCT CAAGTGATCT GCCTGCCTCA 
21598 

GCCTCCCAAA CAAACAAACA ACCCCACAGT TTAATATGTG TTACAACACA CATGCTGCAA 
21658 

CTTTTATGAG TATTTTAATG ATATAGATTA TAAAAGGTTG TTTTTAACTT TTAAATGCTG 
21718 

GGATTACAGG CATGAGCCAC TGTGCCAGGC CTGAACTGTG TTTTTAAAAA TGTCTGACCA 
21778 

GCTGTACATA GTCTCCTGCA GACTGGCCAA GTCTCAAAGT GGG AACAGGT GTATTAAGGA 
21838 

CTATCCTTTG GTTAAATTTC CGCAAATGTT CCTGTGCAAG AATTCTTCTA ACTAGAGTTC 
21898 

TCATTTATTA TATTTATTTC AG AT AAT GCA CCC CGG ACC ATA TTT ATT ATA 
21949 

Asp Asn Ala Pro Arg Thr He Phe He He 

40 45 
AGT ATG TAT AAA GAT AGC CAG CCT AGA GGT ATG GCT GTA ACT ATC TCT 
21997 

Ser Met Tyr Lys Asp Ser Gin Pro Arg Gly Met Ala Val Thr He Ser 
50 55 60 65 

GTG AAG TGT GAG AAA ATT TCA ACT CTC TCC TGT GAG AAC AAA ATT ATT 
22045 

Val Lys Cys Glu Lys lie Ser Thr Leu Ser Cys Glu Asn Lys He He 

70 75 80 

TCC TTT AAG GTAAGACTG AGCCTTACTT TGTTTTCAAT CATGTTAATA T AAT C AAT AT 
22103 

Ser Phe Lys 

AATTAGAAAT ATAACATTAT TTCTAATGTT AATATAAGTA ATGTAATTAG AAAACTCAAA 
22163 

TATCCTCAGA CCAACCTTTT GTCTAGAACA GAAATAACAA G AAG CAG AGA ACCATTAAAG 
22223 

TGAATACTTA CTAAAAATTA TCAAACTCTT TACCTATTGT GATAATGATG GTTTTTCTGA 
22283 

GCCTGTCACA GGGGAAGAGG AGATACAACA CTTGTTTTAT GACCTGCATC TCCTGAACAA 
22323 
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TCAGTCTTTA TACAAATAAT AATGTAGAAT ACATATGTGA GTTATACATT TAAGAATAAC 
22403 

ATGTGACTTT CCAGAATGAG TTCTGCTATG AAGAATGAAG CTAATTATCC TTCTATATTT 
22463 

5 CTACACCTTT GTAAATTATG ATAATATTTT AATCCCTAGT TGTTTTGTTG CTGATCCTTA 

22523 

GCCTAAGTCT TAGACACAAG CTTCAGCTTC CAGTTGATGT ATGTTATTTT TAATGTTAAT 
22583 

CTAATTGAAT AAAAGTTATG AGATCAGCTG TAAAAGTAAT GCTATAATTA TCTTCAAGCC 
10 22643 

AGGTATAAAG TATTTCTGGC CTCTACTTTT TCTCTATTAT TCTCCATTAT TATTCTCTAT 
22703 

TATTTTTCTC TATTTCCTCC ATTATTGTTA GATAAACCAC AATTAACTAT AGCTACAGAC 
22763 

TGAGCCAGTA AGAGTAGCCA GGGATGCTTA CAAATTGGCA ATGCTTCAGA GGAGAATTCC 
15 22823 

ATGTCATGAA GACTCTTTTT GAGTGGAGAT TTGCCAATAA ATATCCGCTT TCATGCCCAC 
22883 

CCAGTCCCCA CTGAAAGACA GTTAGGATAT GACCTTAGTG AAGGTACCAA GGGGCAACTT 
22943 

GGTAGGGAGA AAAAAGCCAC TCTAAAATAT AATCCAAGTA AGAACAGTGC ATATGCAACA 
23003 

GATACAGCCC CCAGACAAAT CCCTCAGCTA TCTCCCTCCA ACCAGAGTGC CACCCCTTCA 
23063 

GGTGACAATT TGGAGTCCCC ATTCTAGACC TGACAGG CAG CTTAGTTATC AAAATAGCAT 
23123 

25 AAGAGGCCTG GGATGGAAGG GTAGGGTGGA AAGGGTTAAG CATGCTGTTA CTGAACAACA 

23183 

TAATTAGAAG GGAAGGAGAT GGCCAAGCTC AAGCTATGTG GGATAGAGGA AAACTCAGCT 
23243 

GCAGAGGCAG ATTCAGAAAC TGGGATAAGT CCGAACCTAC AGGTGGATTC TTGTTGAGGG 
23303 

30 AGACTGGTGA AAATGTTAAG AAGATGGAAA TAATGCTTGG CACTTAGTAG GAACTGGGCA 

233 63 

AATCCATATT TGGGGGAGCC TGAAGTTTAT TCAATTTTGA TGGCCCTTTT AAATAAAAAG 
23423 

AATGTGGCTG GGCGTGGTGG CTCACACCTG TAATCCCAGC ACTTTGGGAG GCCGAGGGGG 
35 23483 

GCGGATCACC TGAAGTCAGG AGTTCAAGAC CAGCCTGACC AACATGGAGA AACCCCATCT 
23543 

CTACTAAAAA TACAAAATTA GCTGGGCGTG GTGGCATATG CCTGTAATCC CAGCTACTCG 
23603 

GGAGGCTGAG GCAGGAGAAT CTTTTGAACC CGGGAGGCAG AGGTTGCGAT GAGCCTAGAT 
40 23663 

CGTGCCATTG CACTCCAGCC TGGGCAACAA GAGCAAAACT CGGTCTCAAA AAAAAAAAAA 
23723 

AAAAAGTGAA ATTAACCAAA GGCATTAGCT TAATAATTTA ATACTGTTTT TAAGTAGGGC 
23783 

GGGGGGTGGC TGGAAGAGAT CTGTGTAAAT GAGGGAATCT GACATTTAAG CTTCATCAGC 
23843 

ATCATAGCAA ATCTGCTTCT GGAAGGAACT CAATAAATAT TAGTTGGAGG GGGGGAGAGA 
23903 

GTGAGGGGTG GACTAGGACC AGTTTTAGCC CTTGTCTTTA ATCCCTTTTC CTGCCACTAA 
23963 

SO TAAGGATCTT AGCAGTGGTT ATAAAAGTGG CCTAGGTTCT AGATAATAAG ATACAACAGG 

24023 

CCAGGCACAG TGGCTCATGC CTATAATCCC AGCACTTTGG GAGGGCAAGG CGAGTGTCTC 
24083 

ACTTGAGATC AGGAGTTCAA GACCAGCCTG GCCAGCATGG CGATACTCTG TCTCTACTAA 
24143 

55 AAAAAATACA AAAATTAGCC AGGCATGGTG GCATGCACCT GTAATCCCAG CTACTCGTGA 
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24203 

GCCTGAGGCA GAAGAATCGC TTGAAACCAG GAGGTGTAGG CTGCAGTGAG CTGAGATCGC 
24263 

ACCACTGCAC TCCAGCCTGG GCGACAGAAT GAGACTTTGT CTCAAAAAAA GAAAAAGATA 
24323 

CAACAGGCTA CCCTTATGTG CTCACCTTTC ACTGTTGATT ACTAGCTATA AAGTCCTATA 
24383 

AAGTTCTTTG GTCAAGAACC TTGACAACAC TAAGAGGGAT TTGCTTTGAG AGGTTACTGT 
24443 

10 CAGAGTCTGT TTCATATATA TACATATACA TGTATATATG TATCTATATC CAGGCTTGGC 

24503 

CAGGGTTCCC TCAGACTTTC CAGTGCACTT GGGAGATGTT AGGTCAATAT CAACTT^CCC 
24563 

TGGATTCAGA TTCAACCCCT TCTGATGTAA AAAAAAAAAA AAAAAAGAAA GAAATCCCTT 
24623 

75 TCCCCTTGGA GCACTCAAGT TTCACCAGGT GGGGCTTTCC AAGTTGGGGG TTCTCCAAGG 

24683 

TCATTGGGAT TGCTTTCACA TCCATTTGCT ATGTACCTTC CCTATGATGG CTGGGAGTGG 
24743 

TCAACATCAA AACTAGGAAA GCTACTGCCC AAGGATGTCC TTACCTCTAT TCTGAAATGT 
20 24803 

GCAATAAGTG TGATTAAAGA GATTGCCTGT TCTACCTATC CACACTCTCG CTTTCAACTG 
24863 

TAACTTTCTT TTTTTCTTTT TTTCTTTTTT TCTTTTTTTT TGAAACGGAG TCTCGCTCTG 
24923 

TCGCCCAGGC TAGAGTGCAG TGGCACGATC TCAGCTCACT GCAAGCTCTG CCTCCCGGGT 
& 24983 

TCACGCCATT CTCCTGCCTC ACCCTCCCAA GCAGCTGGGA CTACAGGCGC CTGCCACCAT 
25043 

GCCCAGCTAA TTTTTTGTAT TTTTAGTAGA GACGGGGTTT CACCGTGTTA GCCAGGATGG 
25103 

TCTCGATCTC CTGAACTTGT GATCCGCCCG CCTCAGCCTC CCAAAGTGCT GGGATTACAG 
25163 

GCGTGAGCCA TCGCACCCGG CTCAACTGTA ACTTTCTATA CTGGTTCATC TTCCCCTGTA 
25223 

ATGTTACTAG AGCTTTTGAA GTTTTGGCTA TGGATTATTT CTCATTTATA CATTAGATTT 
25283 

35 CAGATTAGTT CCAAATTGAT GCCCACAGCT TAGGGTCTCT TCCTAAATTG TATATTGTAG 

25343 

ACAGCTGCAG AAGTGGGTGC CAATAGGGGA ACTAGTTTAT ACTTTCATCA ACTTAGGACC 
25403 

CACACTTGTT GATAAAGAAC AAAGGTCAAG AGTTATGACT ACTGATTCCA CAACTGATTG 
25463 

40 AGAAGTTGGA GATAACCCCG TGACCTCTGC CATCCAGAGT CTTTCAGGCA TCTTTGAAGG 

25523 

ATGAAGAAAT GCTATTTTAA TTTTGGAGGT TTCTCTATCA GTGCTTAGGA TCATGGGAAT 
25583 

CTGTGCTGCC ATGAGGCCAA AATTAAGTCC AAAACATCTA CTGGTTCCAG GATTAACATG 
as 25643 

GAAGAACCTT AGGTGGTGCC CACATGTTCT GATCCATCCT GCAAAATAGA CATGCTGCAC 
25703 

TAACAGGAAA AGTGCAGGCA GCACTACCAG TTGGATAACC TGCAAGATTA TAGTTTCAAG 
25763 

TAATCTAACC ATTTCTCACA AGGCCCTATT CTGTGACTGA AACATACAAG AATCTGCATT 
SO 25823 

TGGCCTTCTA AGGCAGGGCC CAGCCAAGGA GACCATATTC AGGACAGAAA TTCAAGACTA 
25883 

CTATGGAACT GGAGTGCTTG GCAGGGAAGA CAGAGTCAAG GACTGCCAAC TGAGCCAATA 
25943 

CAGCAGGCTT ACACAGGAAC CCAGGGCCTA GCCCTACAAC AATTATTGGG TCTATTCACT 
55 26003 



46 



EP 0 816 499 A2 



GTAAGTTTTA ATTTCAGGCT CCACTGAAAG AGTAAGCTAA GATTCCTGGC ACTTTCTGTC 
26063 

TCTCTCACAG TTGGCTCAGA AATGAGAACT GGTCAGGCCA GGCATGGTGG CTTACACCTG 
26123 

GAATCCCAGC ACTTTGGGAG GCCGAAGTGG GAGGGTCACT TGAGGCCAGG AGTTCAGGAC 
26133 

CAGCTTAGGC AACAAAGTGA GATACCCCCT GACCCCTTCT CTACAAAAAT AAATTTTAAA 
26243 

AATTAGCCAA ATGTGGTGGT GTATACTTAC AGTCCCAGCT ACTCAGGAGG CTGAGGCAGG 
26303 

GGGATTGCTT GAGCCCAGGA ATTCAAGGCT GCAGTGAGCT ATGATTTCAC CACTGCACTT 
26363 

CTGGCTGGGC AACAGAGCGA GACCCTGTCT CAAAGCAAAA AGAAAAAGAA ACTAGAACTA 
26423 

GCCTAAGTTT GTGGGAGGAG GTCATCATCG TCTTTAGCCG TGAATGGTTA TTATAGAGGA 
26433 

CAGAAATTGA CATTAGCCCA AAAAGCTTGT GGTCTTTGCT GGAACTCTAC TTAATCTTGA 
26543 

GCAAATGTGG ACACCACTCA ATGGGAGAGG AGAGAAGTAA GCTGTTTGAT GTATAGGGGA 
26603 

AAACTAGAGG CCTGGAACTG AATATGCATC CCATGACAGG GAGAATAGGA GATTCGGAGT 
26663 

TAAGAAGGAG AGGAGGTCAG TACTGCTGTT CAGAGATTTT TTTTATGTAA CTCTTGAGAA 
26723 

GCAAAACTAC TTTTGTTCTG TTTGGTAATA TACTTCAAAA CAAACTTCAT ATATTCAAAT 
26783 

TGTTCATGTC CTGAAATAAT TAGGTAATGT TTTTTTCTCT ATAG GAA ATG AAT CCT 
26839 



Glu Met Asn Pro 
85 



CCT GAT 


AAC 


ATC 


AAG 


GAT 


ACA 


AAA 


AGT 


GAC 


ATC 


ATA 


TTC 


TTT 


CAG 


AGA 


26887 






























Pro Asp 


Asn 


He 


Lys 


Asp 


Thr 


Lys 


Ser 


Asp 


He 


lie 


Phe 


Phe 


Gin 


Arg 


90 










95 








100 








AGT GTC 


CCA 


GGA 


CAT 


GAT 


AAT 


AAG 


ATG 


CAA 


TTT 


GAA 


TCT 


TCA 


TCA 


TAC 


26935 






























Ser Val 


Pro 


Gly 


His 


Asp 


Asn 


Lys 


Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


Tyr 


105 








110 








115 










120 


GAA GGA 


TAC 


TTT 


CTA 


GCT 


TGT 


GAA 


AAA 


GAG 


AGA 


GAC 


CTT 


TTT 


AAA 


CTC 


26983 






























Glu Gly 


Tyr 


Phe 


Leu 
125 


Ala 


Cys 


Glu 


Lys 


Glu 
130 


Arg 


Asp 


Leu 


Phe 


Lys 
135 


Leu 


ATT TTG 


AAA 


AAA 


GAG 


GAT 


GAA 


TTG 


GGG 


GAT 


AGA 


TCT 


ATA 


ATG 


TTC 


ACT 


27031 






























lie Leu 


Lys 


Lys 
140 


Glu 


Asp 


Glu 


Leu 


Gly 
145 


Asp 


Arg 


Ser 


He 


Met 
150 


Phe 


Thr 



GTT CAA AAC GAA GAC T AGCTATTAAA ATTTCATGCC GGGCGCAGTG GCTCACGCCT 
27087 

Val Gin Asn Glu Asp 
155 

GTAATCCCAG CCCTTTGGGA GGCTGAGGCG GGCAGATCAC CAGAGGTCAG GTGTTCAAGA 
27147 

CCAGCCTGAC CAACATGGTG AAACCTCATC TCTACTAAAA ATACAAAAAA TTAGCTGAGT 
27207 

GTAGTGACCC ATGCCCTCAA TCCCAGCTAC TCAAGAGGCT GAGGCAGGAG AATCACTTGC 
27267 

ACTCCGGAGG TGGAGGTTGT GGTGAGCCGA GATTGCACCA TTGCGCTCTA GCCTGGGCAA 
27327 

CAACAGCAAA ACTCCATCTC AAAAAATAAA ATAAATAAAT AAACAAATAA AAAATTCATA 
27387 

ATGTGAACTG TCTGAATTTT TATGTTTAGA AAGATTATGA GATTATTAGT CTATAATTGT 
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27447 

AATGGTGAAA TAAAATAAAT ACCAGTCTTG AAAAACATCA TTAAGAAATG AATGAACTTT 
27507 

CACAAAAGCA AACAAACAGA CTTTCCCTTA TTTAAGTGAA TAAAATAAAA TAAAATAAAA 
5 27567 

TAATGTTTAA AAAATT C ATA GTTTGAAAAC ATTCTACATT GTTAATTGGC ATATTAATTA 
27627 

TACTTAATAT AATTATTTTT AAATCTTTTG GGTTATTAGT CCTAATGACA AAAGATATTG 
27687 

ATATTTGAAC TTTCTAATTT TTAAGAATAT CGTTAAACCA TCAATATTTT TATAAGGAGG 
27747 

CCACTTCACT TGACAAATTT CTGAATTTCC TCCAAAGTCA GTATATTTTT AAAATT CAGT 
27807 

TTGATCCTGA ATCCAGCAAT ATATAAAAGG GATTATATAC TCTGGCCAAC TGACATTCAT 
27867 

15 CCTAGGAATG CAAAGATGGT TTAATATCCT AAAATCAATT AACATAACAT ACTATATTAA 

27927 

TAAAGTATCA AAACAGTATT CTCATCTTTT TTTCTTTTTT CACAATTCCT TGGTTACACT 

27987 

ATCATCTCAA TAGATGCAGA AAAAGCATTT GACAAAATCC AATTCATAAT AAAAATTCTC 
28047 

20 AAACTTGAAA GAGAACATCA TAAAGGCATC TATGAAAAAC CTACAGCTAA TATCATACTT 

28107 

AACGATGAAA AACTGAATTA TTTTACCCTA AGATCAAGAA TAATGCAAGC ATGTCAGCTC 

28157 

TTGCAACTTC TATTCAACAT TGTACTGGAG GTTCTAGCCA GAGCAACCAT ACAATAAATA 

25 2 8 2 2 7 

AAAATAAAAG GCACCCAGAT TAGAAAGGAA GTCTTTATTT GCAGACAACA TGGTTCTTTA 
28287 

TGCAGAAAAC CGTCAGGAAT ACACACACAT GTTAGAACTA ATAAGTTCAG CAAGGTTGCA 

28347 

GGTTGCAATA TCAATATGCA AAAATACATT GAAGGCTGGG CTCAGTGGAG ATGGCATGTA 
30 28407 

CCTTTCGTCC CAGCTACTTG GGAGGCTGAG GTAGGAGGAT CACTTGAGGT GAGGAGTTTG 

28467 

AGGCTATAGT GCAATGTGAT CTTGCCTGTG AATAGCCACT GCACTCGAGC CTAGGCAACA 
28527 

AAGTGAGACC CCGTCTCCAA AAAAAAAAAT GGTATATTGG TATTTCTGTA TATGAACAAT 

35 28587 

GAATGATCTG AAAACAAGAA AATTCCATTC ACGATGGTAT TAAAAAAATA AAATACAAAT 

28647 

AAATTTAGCA AAATAATTAT AAAACTTGTA CATCGAAAAT TTCAAAGCAC TCTGAGGGAA 
28707 

40 ATTAAAGATG ATCTAAATAA TTGGAGAGAC ACTCTATGAT CACTGATTGG AAAATT C ATT 

28767 

CAATATTGTT AAGATAACAA TTGTCCCCAA ATTGATGCAT GCATTCAATT TAGTCTTCAT 
28827 

CAAAATTCCA GCAGGGTTTT TGCAGAAATT GACAAGCTGT ACC CAAAATG TATATGGAAA 
23887 

45 TGAAAAGACC CAGAAGAGCA AATAATTTTT TAAAAACAAA GTTGGAAAAC TTTTACTTCC 

28947 

TAATTTTAAA ACTTACTATA AACCTAAAGT TATCAAGACC ATTTAGT 
28994 



(16) INFORMATION FOR SEQ ID NO: 15: 
(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 10 amino acids 
{B) TYPE : amino acid 
(D) TOPOLOGY: linear 
55 (ii) MOLECULE TYPE: peptide 
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(v) FRAGMENT TYPE : n- terminal fragment 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 15: 

Tyr Phe Gly Lys Leu Glu Ser Lys Leu Ser 
1 5 10 



Claims 



1. A genomic DNA, which encodes a polypeptide capable of inducing the production of interferon-y by immunocom- 
petent cells; said polypeptide possessing an amino acid sequence given in SEQ ID NO:1 (where the symbol "Xaa" 
means "isoleucine" or "threonine") or one of functional equivalents thereof; 





SEQ ID 


NO: 


1: 








Tyr 


Phe 


Gly 


Lys 


Leu 


Glu 


Ser 


Lys 


1 








5 






Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 








20 








Met 


Thr 


Asp 


Ser 


Asp 


Cys 


Arg 


Asp 






35 








40 


He 


Ser 


Met 


Tyr 


Lys 


Asp 


Ser 


Gin 




50 










55 




Ser 


Val 


Lys 


Cys 


Glu 


Lys 


He 


Ser 


65 










70 






He 


Ser 


Phe 


Lys 


Glu 


Met 


Asn 


Pro 










85 








Ser 


Asp 


He 


He 


Phe 


Phe 


Gin 


Arg 








100 








Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


Tyr 






115 










120 


Lys 


Glu 


Arg 


Asp 


Leu 


Phe 


Lys 


Leu 




130 










135 




Gly 


Asp 


Arg 


Ser 


He 


Met 


Phe 


Thr 


145 










150 







Leu 


Ser 


Val 


He 


Arg 


Asn 


Leu 


Asn 




10 








15 




Gly 


Asn 


Arg 


Pro 


Leu 


Phe 


Glu 


Asp 


25 










30 




Asn 


Ala 


Pro 


Arg 


Thr 
45 


He 


Phe 


He 


Pro 


Arg 


Gly 


Met 
60 


Ala 


Val 


Thr 


He 


Xaa 


Leu 


Ser 


Cys 


Glu 


Asn 


Lys 


He 






75 








80 


Pro 


Asp 


Asn 


He 


Lys 


Asp 


Thr 


Lys 




90 










95 


Ser 


Val 


Pro 


Gly His 


Asp 


Asn 


Lys 


105 










110 




Glu 


Gly Tyr 


Phe 


Leu 


Ala 


Cys 


Glu 










125 






He 


Leu 


Lys 


Lys 
140 


Glu 


Asp 


Glu 


Leu 


Val 


Gin 


Asn 


Glu 


Asp 









155. 



2. The genomic DNA of claim 1, which comprises two or more exons; each of the exons possessing a part of nucleotide 
sequence given in SEQ ID NO:2; 



SEQ ID NO: 2: 



GCCTGGACAG TCAGCAAGGA ATTGTCTCCC AGTGCATTTT GCCCTCCTGG CTGCCAACTC 60 

TGGCTGCTAA AGCGGCTGCC ACCTGCTGCA GTCTACACAG CTTCGGGAAG AGGAAAGGAA 120 

CCTCAGACCT TCCAGATCGC TTCCTCTCGC AACAAACTAT TTGTCGCAGG AATAAAG 177 

ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC A AC TTT GTG GCA ATG 225 
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Met 


Ala 
-35 


Ala 


Glu 


Pro 


Val 


Glu 
-30 


Asp 


Asn 


Cvs 


He 


Asn 
-25 


Phe 


Val 


Ala 


Met 




AAA 


TTT 


ATT 


GAC 


AAT 


ACG 


CTT 


TAC 


TTT 


ATA 


GCT 


GAA 


GAT 


GAT 


GAA 


AAC 


273 


Lvs 


Phe 


He 


Asp 


Asn 


Thr 


Leu 


Tvr 


Phe 


He 


Ala 


Glu 


Asp 


Asp 


Glu 


Asn 




-20 










-15 










-10 






- 5 




CTG 


GAA 


TCA 


GAT 


TAC 


TTT 


GGC 


AAG 


CTT 


GAA 


TCT 


AAA 


TTA 


TCA 


GTC 


ATA 


321 


Leu 


Glu 


Ser 


Asd 


Tvr 
1 


Phe 


Glv 


Lvs 


Leu 
5 


Glu 


Ser 


Lvs 


Leu 


OCi. 


VOX 


x x e 




AGA 


AAT 


TTG 


AAT 


GAC 


CAA 


GTT 


CTC 


TTC 


ATT 


GAC 


CAA 


GGA 


AAT 


CGG 


CCT 


369 


Ara 


Asn 


Leu 


Asn 


Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 


Gly 


ten 


Arg 


Pro 








15 










20 










X. o 








CTA 


TTT 


GAA 


GAT 


ATG 


ACT 


GAT 


TCT 


GAC 


TGT 


AGA 


GAT 


AAT 






CGG 


d 1 7 


Leu 


Phe 


Glu 


Asp 


Met 


Thr 


Asp 


Ser 


Asp 


Cys 


Arg 




Asn 


nid 


Pro 


Arg 






30 










35 










40 










ACC 


ATA 


TTT 


ATT 


ATA 


AGT 


ATG 


TAT 


AAA 


GAT 


AGC 


CAG 


CCT 


AGA 


GGT 


ATG 


465 


Thr 


lie 


Phe 


He 


J. xe 


OCi. 




i yr 




A en 
nop 


Cor 


d i n 
uin 


"ro 


Arg 


uiy 


rie t 




45 


























ou 




GCT 


GTA 


ACT 


ATC 


TCT 


GTG 


AAG 


TGT 


GAG 


AAA 


ATT 


TCA 


AVT 


pfp 




TGT 




Ala 


Val 


Thr 


He 


Ser 


Val 


^y^ 






T.vrc 


X x tr 


C q y- 
Oct 


Ada 


Leu 


Ser 


Cys 












65 

vJ *J 










7D 










/ 3 




GAG 


A AC 


AAA 


ATT 


ATT 


TCC 


TTT 


AAG 


f5AA 




AAT 




v-»v-. 1 


GAT 


AAC 


n I I, 


DDI 


Glu 


Asn 




He 
80 


I le 


ocx 


Php 

r i its 


Ly s 


PI |, 
\*X u 

85 




Asn 


Pro 


Pro 


Asp 
on 


Asn 


Tin 

lie 




AAG 


GAT 


ACA 


AAA 


AGT 


GAC 


ATC 


ATA 


TTC 


TTT 


CAG 


AGA 


AGT 


GTC 


CCA 


GGA 


609 


Lvs 


Asd 


Thr 


Lvs 


Ser 


Asp 


lie 


He 


Phe 


Phe 


Gin 


Arg 


Car- 
OCX 


v ox 


rto 


ni v 








95 










100 










X W J 








CAT 


GAT 


AAT 


AAG 


ATG 


CAA 


TTT 


GAA 


TCT 


TCA 


TCA 


TAC 


GAA 


GGA 


TAC 


TTT 


657 


His 


ASD 


Asn 


LVS 


Met 


Gin 


Phe 


Glu 


Ser 


Ser 


Ser 


1 y *• 


Glu 


oxy 


i yir 


php 






110 










115 










120 








CTA 


GCT 


TGT 


GAA 


AAA 


GAG 


AGA 


GAC 


CTT 


TTT 


AAA 


CTC 


ATT 


TTG 


AAA 


AAA 


705 


Leu 


Ala 


Cys 


Glu 


Lys 


Glu 


Arg 


Asp 


Leu 


Phe 


Lys 


Leu 


He 


Leu 


Lys 


Lys 




125 










130 










135 










140 




GAG 


GAT 


GAA 


TTG 


GGG 


GAT 


AGA 


TCT 


ATA 


ATG 


TTC 


ACT 


GTT 


CAA 


AAC 


GAA 


753 


Glu 


Asp 


Glu 


Leu 


Gly 
145 


Asp 


Arg 


Ser 


He 


Met 
150 


Phe 


Thr 


Val 


Gin 


Asn 
155 


Glu 





GAC TAGCTATTAA AATTTCATGC CGGGCGCAGT GGCTCACGCC TGTAATCCCA 806 
Asp 

GCCCTTTGGG AGGCTGAGGC GGGCAGATCA CCAGAGGTCA GGTGTTCAAG ACCAGCCTGA 866 

CCAACATGGT GAAACCTCAT CTCTACTAAA AATACTAAAA ATTAGCTGAG TGTAGTGACG 9 26 

CATGCCCTCA ATCCCAGCTA CTCAAGAGGC TGAGGCAGGA GAATCACTTG CACTCCGGAG 986 

GTAGAGGTTG TGGTGAGCCG AGATTGCACC ATTGCGCTCT AGCCTGGGCA ACAACAGCAA 1046 

AACTCCATCT CAAAAAATAA AATAAATAAA TAAACAAATA AAAAATTCAT AATGTGAAAA 1106 
AAAAAAAAAA A AAA 1120. 



The genomic DNAof claim 1, which comprises two exons with respective nucleotide sequences given in SEQ ID 
NOs:3and 4; 



SEQ ID NO: 3: 

A A AAC CTG GAA TCA GAT TAC TTT GGC AAG CTT GAA TCT AAA TTA TCA 47 
Glu Asn Leu Glu Ser Asp Tyr Phe Gly Lys Leu Glu Ser Lys Leu Ser 



50 



EP 0 816 499 A2 



-5 15 10 



GTC 


ATA 


AGA 


AAT 


TTG 


AAT 


GAC 


CAA 


GTT 


CTC 


TTC 


ATT 


GAC 


CAA 


GGA 


AAT 


95 


Val 


He Arg 


Asn 


Leu 


Asn 


Asp 


Gin 


Val 


Leu 


Phe 


He 


Asp 


Gin 


Gly 


Asn 












15 










20 








25 






CGG 


CCT 


CTA 


TTT 


GAA 


GAT 


ATG 


ACT 


GAT 


TCT 


GAC 


TGT 


AGA 


G 






135 
* j j 


Arg 


Pro 


Leu 


Phe 


Glu 


Asp 


Met 


Thr 


ASD 


Ser 


ASD 


Cvs 


Arg 


ASD 














30 










35 








40 










SEQ ID 


NO: 


4: 


























AT 


AAT 


GCA 


CCC 


CGG 


ACC 


ATA 


TTT 
x x i 


ATT 

f\ X X 


ATA 


AfJT 

r\\J X 


nl u 


TAT 


AAA 
Ann 


HAT 




A1 
** / 


Asp 


Asn 


Ala 


Pro 


Arg 


Thr 


He 


Phe 


He 


He 


Ser 


Met 


Tyr 


Lys 


Asp 


Ser 




40 










45 










50 




55 




CAG 


CCT 


AGA 


GGT 


ATG 


GCT 


GTA 


ACT 


ATC 


TCT 


GTG 


AAG 


TGT 


GAG 


AAA 


ATT 


95 


Gin 


Pro 


Arg 


Gly 


Met 


Ala 


Val 


Thr 


He 


Ser 


Val 


Lys 


Cys 


Glu 


Lys 


He 












60 










65 








70 






TCA 


ACT 


CTC 


TCC 


TGT 


GAG 


AAC 


AAA 


ATT 


ATT 


TCC 


TTT 


AAG 








134 


Ser 


Thr 


Leu 


Ser 


Cys 


Glu 


Asn 


Lys 


He 


He 


Ser 


Phe 


Lys 
















80 








85. 

















The genomic DNA of claim 1, which comprises two exons with respective nucleotide sequences given in SEQ ID 
NOs:5 and 6; 



SEQ ID NO: 5: 

GAATAAAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC AAC TTT GTG 50 
Met Ala Ala Glu Pro Val Glu Asp Asn Cys He Asn Phe Val 
-35 -30 " -25 

GCA ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA G 87 
Ala Met Lys Phe He Asp Asn Thr Leu Tyr Phe He Ala 
-20 -15 -10 



SEQ ID NO: 6: ■ 

CT GAA GAT GAT G 12 
Ala Glu Asp Asp Glu 
-10. 

The genomic DNA of claim 3, which comprises additional two exons with respective nucleotide sequences given 
inSEQIDNOs:5and6. 

The genomic DNA of claim 1 , which comprises an exon with a part of a nucleotide sequence given in SEQ ID NO:7; 
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SEQ ID 


NO: 


7: 














GAA 


A IO 


A AT 


CCT 


CCT 


GAT 


AAC 


ATC 


AAG 


GAT 


ACA 


Glu 


rlct 


Asn 


Pro 


Pro 


Asp 


Asn 


He 


Lys 


Asp 


Thr 


85 










90 






95 


TTC 


TTT 




AGA 


AGT 


GTC 


CCA 


GGA 


CAT 


GAT 


AAT 


Phe 


rne 




Arg 


Ser 


Val 


Pro 


Gly 


His 


Asp 


Asn 










105 








110 




TCT 






TAC 


GAA 


GGA 


TAC 


TTT 


CTA 


GCT 


TGT 


Ser 


Ser 


Ser 


Tyr 


Glu 


Gly 


Tyr 


Phe 


Leu 


Ala 


Cys 








120 










125 




CTT 


TTT 


AAA 


CTC 


ATT 


TTG 


AAA 


AAA 


GAG 


GAT 


GAA 


Leu 


Phe 


Lys 


Leu 


He 


Leu 


Lys 


Lys 


Glu 


Asp 


Glu 






135 










140 






ATA 


ATG 


TTC 


ACT 


GTT 


CAA 


AAC 


GAA 


GAC 


TAGCTAT 


He 


Met 


Phe 


Thr 


Val 


Gin 


Asn 


Glu 


Asp 








150 










155 









AAA AGT GAC ATC ATA 
Lys Ser Asp lie He 
100 

AAG ATG CAA TTT GAA 
Lys Met Gin Phe Glu 
115 

GAA AAA GAG AGA GAC 
Glu Lys Glu Arg Asp 
130 

TTG GGG GAT AGA TCT 
Leu Gly Asp Arg Ser 
145 

TAAAATTTCA TGCCGGGCGC 



AGTGGCTCAC GCCTGTAATC CCAGCCCTTT GGGAGGCTGA GGCGGGCAGA 
TCAGGTGTTC AAGACCAGCC TGACCAACAT GGTGAAACCT CATCTCTACT 
AAAATTAGCT GAGTGTAGTG ACCCATGCCC TCAATCCCAG CTACTCAAGA 
GGAGAATCAC TTGCACTCCG GAGGTGGAGG TTGTGGTGAG CCGAGATTGC 
TCTAGCCTGG GCAACAACAG CAAAACTCCA TCTCAAAAAA TAAAATAAAT 
ATAAAAAATT CATAATGTGA ACTGTCTGAA TTTTTATGTT TAGAAAGATT 
TAGTCTATAA TTGTAATGGT GAAATAAAAT AAATACCAGT CTTGAAAAAC 
AATGAATGAA CTTTCACAAA AGCAAACAAA CAGACTTTCC CTTATTTAAG 
AAAATAAAAT AAAATAATGT TTAAAAAATT CATAGTTTGA AAACATTCTA 
TGGCATATTA ATTATACTTA ATATAATTAT TTTTAAATCT TTTGGGTTAT 
GACAAAAGAT ATTGATATTT GAACTTTCTA ATTTTTAAGA ATATCGTTAA 
TTTTTATAAG GAGGCCACTT CACTTGACAA ATTTCTGAAT TTCCTCCAAA 
TTTTAAAATT CAGTTTGATC CTGAATCCAG CAATATATAA AAGGGATTAT 
CAACTGACAT TCATCCTAGG AATGCAAAGA TGGTTTAATA TCCTAAAATC 
ACATACTATA TTAATAAAGT ATCAAAACAG TATTCTCATC TTTTTTTCTT 
TCCTTGGTTA CACTATCATC TCAATAGATG CAGAAAAAGC ATTTGACAAA 
TAATAAAAAT TCTCAAACTT GAAAGAGAAC ATCATAAAGG CATCTATGAA 
CTAATATCAT ACTTAACGAT GAAAAACTGA ATTATTTTAC CCTAAGATCA 
AAGCATGTCA GCTCTTGCAA CTTCTATTCA ACATTGTACT GGAGGTTCTA 
CCATACAATA AATAAAAATA AAAGGCACCC AGATTAGAAA GGAAGTCTTT 
AACATGGTTC TTTATGCAGA AAACCGTCAG GAATACACAC ACATGTTAGA 
TCAGCAAGGT TGCAGGTTGC AATATCAATA TGCAAAAATA CATTGAAGGC 
GGAGATGGCA TGTACCTTTC GTCCCAGCTA CTTGGGAGGC TGAGGTAGGA 
AGGTGAGGAG TTTGAGGCTA TAGTGCAATG TGATCTTGCC TGTGAATAGC 
GAGCCTAGGC AACAAAGTGA GACCCCGTCT CCAAAAAAAA AAATGGTATA 
TGTATATGAA CAATGAATGA TCTGAAAACA AGAAAATTCC ATTCACGATG 
AATAAAATAC AAATAAATTT AGCAAAATAA TTATAAAACT TGTACATCGA 
GCACTCTGAG GGAAATTAAA GATGATCTAA ATAATTGGAG AGACACTCTA 
TTGGAAAATT CATTCAATAT TGTTAAGATA ACAATTGTCC CCAAATTGAT 
AATTTAGTCT TCATCAAAAT TCCAGCAGGG TTTTTGCAGA AATTGACAAG 
AATGTATATG GAAATGAAAA GACCCAGAAG AGCAAATAAT TTTTTAAAAA 
AAACTTTTAC TTCCTAATTT TAAAACTTAC TATAAACCTA AAGTTATCAA 
T 



TCACCAGAGG 

AAAAATACAA 

GGCTGAGGCA 

ACCATTGCGC 

AAATAAACAA 

ATGAGATTAT 

ATCATTAAGA 

TGAATAAAAT 

CATTGTTAAT 

TAGTCCTAAT 

ACCATCAATA 

GTCAGTATAT 

ATACTCTGGC 

AATTAACATA 

TTTTCACAAT 

ATCCAATTCA 

AAACCTACAG 

AGAATAATGC 

GCCAGAGCAA 

ATTTGCAGAC 

ACTAATAAGT 

TGGGCTCAGT 

GGATCACTTG 

CACTGCACTC 

TTGGTATTTC 

GTATTAAAAA 

AAATTTCAAA 

TGATCACTGA 

GCATGCATTC 

CTGTACCCAA 

CAAAGTTGGA 

GACCATTTAG 



48 



96 



144 



192 



246 



306 
366 
426 
486 
546 
606 
666 
726 
786 
846 
906 
966 
1026 
1086 
1146 
1206 
1266 
1326 
1386 
1446 
1506 
1566 
1626 
1686 
1746 
1806 
1866 
1926 
1986 
2046 
2106 
2166 
2167. 



The genomic DNA of claim 3, which comprises additional one exon with a part of a nucleotide sequence given in 
SEQ ID NO:7. 

The genomic DNA of claim 5 t which comprises additional one exon with a part of a nucleotide sequence given in 
SEQ ID NO:7. 

The genomic DNA of claim 1 , which comprises two introns with respective nucleotide sequences given in SEQ ID 
NOs:8 and 9; 
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SEQ ID NO: 8: 



10 



15 



20 



25 



30 



GTATTTTTTT 
TGCTTACATT 
ATTTCACTTA 
ATACAATAAG 
CTTTATTGTG 
GTTACAGCTG 
TACTTGGGAG 
TGTGATCGTA 
CTAAAATTAA 
GCCTTCCAAA 
ATTCTTTAAA 
TAAAAGTTAA 
GTGTGTGTTG 
CTCTGTCACC 
CTCCACCTCC 
GGCATGCACC 
TGTTGGCCAG 
AACAAACAAA 
AGTATTTTAA 
GGCATGAGCC 
TAGTCTCCTG 
TGGTTAAATT 
TATATTTATT 



TAATTCGCAA 
GTTCCGTGCT 
CAGGAAACTT 
ACATTGCTAG 
ATTGCATTAA 
AAAATGUTGA 
GCTCAAGCAG 
CCTGTGAATA 
AAAAAAAAAA 
ACATGAATTC 
AATACTGAAT 
AAACAATCTT 
TAATACACAT 
CAGGCTGAAG 
CACGTTCAAG 
ACTTACACCC 
GCTGGTCTCA 
CAACCCCACA 
TGATATAGAT 
ACTGTGCCAG 
CAGACTGGCC 
TCCGCAAATG 
TCAG 



ACATAGAAAT 
AGTCCCAATC 
TATAAGGCAT 
GGGTCATGCC 
CTGTTTAAAA 
TAGTTTACCA 
GAGGATTGCT 
GCCACTGCAC 
AAAAAAAACC 
CAAATATCAA 
ACTTACCTTA 
TTAGAATTCA 
TAAACTGTGG 
TGCAGTGCAG 
CGATTCTCAT 
GGCTAATTTT 
AACCCCTAAC 
GTTTAATATG 
TATAAAAGGT 
GCCTGAACTG 
AAGTCTCAAA 
TTCCTGTGCA 



GACTAGCTAC 
CTCAGATGAA 
CCACGTTTTT 
TCTCTGAGCC 
CCTCTATAGT 
GGTGTGGTGG 
TGAGGCCAGG 
TCCAGCCTGG 
TTAGGAAAGG 
AGTTAGGCTG 
ACATATATTT 
TATCTTTAAA 
GGTTGTTTGT 
TGCAGTGGTG 
GCCTCAGTCT 
TGTATTTTTA 
CTCAAGTGAT 
TGTTACAACA 
TGTTTTTAAC 
TGTTTTTAAA 
GTGGGAACAG 
AGAATTCTTC 



TTCTTCCCAT 
AAGTCACAGG 
TAGTTGGGGT 
TGCCTTTGAA 
TGGATGCTTA 
CATCTATCTG 
ACTTTGAGGC 
GTGATATACA 
AAATTGATCA 
AGTTGAAGCA 
TAAATATTTT 
ATACTCAAAA 
TTGTTTGAGA 
TGATCTCGGC 
CCCGAGTAGG 
GTAGAGCTGG 
CTGCCTGCCT 
CACATGCTGC 
TTTTAAATGC 
AATGTCTGAC 
GTGTATTAAG 
TAACTAGAGT 



TCTGTTTTAC 
AGTGACAATA 
AAAAAATTGG 
TCACCAATCC 
ATCCCTGCTT 
TAATCCTAGC 
TGTAGTACAC 
GACCTTGTCT 
AGTCTACTGT 
GTGAATGTGC 
ATTTAGCATT 
AAGTTGCAGC 
TGCAGTTTCA 
TCACTACAAC 
TGGGATTACA 
GGTTTCACCA 
CAGCCTCCCA 
AACTTTTATG 
TGGGATTACA 
CAGCTGTACA 
GACTATCCTT 
TCTCATTTAT 



35 



SEQ ID NO: 9: 

GTAAGACTGA GCCTTACTTT GTTTTCAATC ATGTTAATAT 
TAACATTATT TCTAATGTTA ATATAAGTAA TGTAATTAGA 
CAACCTTTTG TCTAGAACAG AAATAACAAG AAGCAGAGAA 
TAAAAATTAT CAAACTCTTT ACCTATTGTG ATAATGATGG 
GGGAAGAGGA GATACAACAC TTGTTTTATG ACCTGCATCT 
ACAAATAATA ATGTAGAATA CATATGTGAG TTATACATTT 
CAGAATGAGT TCTGCTATGA AGAATGAAGC TAATTATCCT 
TAAATTATGA TAATATTTTA ATCCCTAGTT GTTTTGTTGC 



AATCAATATA 
AAACTCAAAT 
CCATTAAAGT 
TTTTTCTGAG 
CCTGAACAAT 
AAGAATAACA 
TCTATATTTC 
TGATCCTTAG 



ATTAGAAATA 
ATCCTCAGAC 
GAATACTTAC 
CCTGTCACAG 
CAGTCTTTAT 
TGTGACTTTC 
TACACCTTTG 
CCTAAGTCTT 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1334 



60 
120 
180 
240 
300 
360 
420 
480 



40 



45 



SO 
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AGACACAAGC TTCAGCTTCC AGTTGATGTA 
AAAGTTATGA GATCAGCTGT AAAAGTAATG 
ATTTCTGGCC TCTACTTTTT CTCTATTATT 
5- ATTTCCTCCA TTATTGTTAG ATAAACCACA 

GAGTAGCCAG GGATGCTTAC AAATTGGCAA 
ACTCTTT TTG AGTGGAGATT TGCCAATAAA 
TG A A AG AC AG TTAGGATATG ACCTTAGTGA 
AAAAGCCACT CTAAAATATA ATCCAAGTAA 
CAGACAAATC CCTCAGCTAT CTCCCTCCAA 
10 GGACTCCCCA TTCTAGACCT GACAGGCACC 

GATGGAAGGG TAGGGTGGAA AGGGTTAAGC 
GAAGGAGATG GCCAAGCTCA AGCTATGTGG 
TTCAGAAACT GGGATAAGTC CGAACCTACA 
AATGTTAAGA AGATGGAAAT AATGCTTGGC 

1S GGGGGAGCCT GAAGTTTATT CAATTTTGAT 

GCGTGGTGGC TCACACCTGT AATCCCAGCA 
GAAGTCAGGA GTTCAAGACC AGCCTGACCA 
ACAAAATTAG CTGGGCGTGG TGGCATATGC 
CAGGAGAATC TTTTGAACCC GGGAGGCAGA 
ACTCCAGCCT GGGCAACAAG AGCAAAACTC 

20 TTAACCAAAG GCATTAGCTT AATAATTTAA 

GGAAGAGATC TGTGTAAATG AGGGAATCTG 
TCTGCTTCTG GAAGGAACTC AATAAATATT 
ACTAGGACCA GTTTTAGCCC TTGTCTTTAA 
GCAGTGGTTA TAAAAGTGGC CTAGGTTCTA 
GCCTCATGCC TATAATCCCA GCACTTTGGG 

25 GGAGTTCAAG ACCAGCCTGG CCAGCATGGC 

AAATTAGCCA GGCATGGTGG CATGCACCTG 
AAGAATCGCT TGAAACCAGG AGGTGTAGGC 
CCAGCCTGGG CGACAGAATG AGACTTTGTC 
CCTTATGTGC TCACCTTTCA CTGTTGATTA 
TCAAGAACCT TGACAACACT AAGAGGGATT 

30 TCATATATAT ACATATACAT GTATATATGT 

CAGACTTTCC AGTGCACTTG GGAGATGTTA 
TCAACCCCTT CTGATGTAAA AAAAAAAAAA 
CACTCAAGTT TCACCAGGTG GGGCTTTCCA 
GCTTTCACAT CCATTTGCTA TGTACCTTCC 
ACTAGGAAAG CTACTGCCCA AGGATGTCCT 
GATTAAAGAG ATTGCCTGTT CTACCTATCC 
TTTTCTTTT1' TTCTTTTTTT CTTTTTTTTT 
AGAGTGCAGT GGCACGATCT CAGCTCACTG 
TCCTGCCTCA CCCTCCCAAG CAGCTGGGAC 
TTTTTGTATT TTTAGTAGAG ACGGGGTTTC 

40 TGAACTTGTG ATCCGCCCGC CTCAGCCTCC 

CGCACCCGGC TCAACTGTAA CTTTCTATAC 
GCTTTTGAAG TTTTGGCTAT GGATTATTTC 
CAAATTGATG CCCACAGCTT AGGGTCTCTT 
ACTGGGTGCC AATAGGGGAA CTAGTTTATA 
ATAAAGAACA AAGGTCAAGA GTTATGACTA 

45 ATAACCCCGT GACCTCTGCC ATCCAGAGTC 

CTATTTTAAT TTTGGAGGTT TCTCTATCAG 
TGAGGCCAAA ATTAAGTCCA AAACATCTAC 
GGTGGTGCCC ACATGTTCTG ATCCATCCTG 
GTGCAGGCAG CACTACCAGT TGGATAACCT 
TTTCTCACAA GGCCCTATTC TGTGACTGAA 

50 GGCAGGGCCC AGCCAAGGAG ACCATATTCA 

GAGTGCTTGG CAGGGAAGAC AGAGTCAAGG 



TGTT ATTTTT AATGTTAATC TAATTGAATA 540 
CTATAATTAT CTTCAA/GCCA GGTATAAAGT 600 
CTCCATTATT ATTCTCTATT ATTTTTCTCT 660 
ATTAACTATA GCTACAGACT GAGCCAGTAA 720 
TGCTTCAGAG GAGAATTCCA TGTCATGAAG 780 
TATCCGCTTT CA'I'GCCCACC CAGTCCCCAC S40 
AGGTACCAAG GGGCAACTTG GTAGGGAGAA 900 
GAACAGTGCA TATGCAACAG ATACAGCCCC 960 
CCAGAGTGCC ACCCCTTCAG GTGACAATTT 1020 
TTAGTTATCA AAATAGCATA AGAGGCCTGG 1080 
ATGCTGTTAC TGAACAACAT AATTAGAAGG 1140 
GATAGAGGAA AACTCAGCTG CAGAGGCAGA 1200 
GGTGGATTCT TGTTGAGGGA GACTGGTGAA 12 60 
ACTTAGTAGG AACTGGGCAA ATCCATATTT 1320 
GGCCCTTTTA AATAAAAAGA ATGTGGCTGG 1380 
CTTTGGGAGG CCGAGGGGGG CGG ATCACCT 1440 
ACATGGAGAA ACCCCATCTC TACTAAAAAT 1500 
CTGTAATCCC AGCTACTCGG GAGGCTGAGG 1560 
GGTTGCGATG AGCCTAGATC GTGCCATTGC 1620 
GGTCTCAAAA AAAAAAAAAA AAAAGTGAAA 1680 
TACTGTTTTT AAGTAGGGCG GGGGGTGGCT 1740 
ACATTTAAGC TTCATCAGCA TCATAGCAAA 1800 
AGTTGGAGGG GGGGAGAGAG TGAGGGGTGG i860 
TCCCTTTTCC TGCCACTAAT AAGGATCTTA 1920 
GATAATAAGA TACAACAGGC CAGGC ACAGT 1980 
AGGGCAAGGC CAGTGTCTCA CTTGAGATCA 2040 
GATACTCTGT CTCTACTAAA AAAAATACAA 2100 
TAATCCCAGC TACTCGTGAG CCTGAGGCAG 2160 
TGCAGTGAGC TGAGATCGCA CCACTGCACT 2220 
TCAAAAAAAG AAAAAGATAC AACAGGCTAC 2280 
CTAGCTATAA AGTCCTATAA AGTTCTTTGG 2340 
TGCTTTGAGA GGTTACTGTC AGAGTCTGTT 2400 
ATCTATATCC AGGCTTGGCC AGGGTTCCCT 24 60 
GGTCAATATC AACTTTCCCT GGATTCAGAT 2520 
AAAAAGAAAG AAATCCCTTT CCCCTTGGAG 2580 
AGTTGGGGGT TCTCCAAGGT CATTGGGATT 2640 
CTATGATGGC TGGGAGTGGT CAACATCAAA 2700 
TACCTCTATT CTGAAATGTG CAATAAGTGT 2760 
ACACTCTCGC TTTCAACTGT AACTTTCTTT 2820 
GAAACGGAGT CTCGCTCTGT CGCCCAGGCT 2880 
CAAGCTCTGC CTCCCGGGTT CACGCCATTC 2940 
TACAGGCGCC TGCCACCATG CCCAGCTAAT 3000 
ACCGTGTTAG CCAGGATGGT CTCGATCTCC 3060 
CAAAGTCCTG GGATTACAGG CGTCAGCCAT 3120 
TGGTTCATCT TCCCCTGTAA TGTTACTAGA 3180 
TCATTTATAC ATTAGATTTC AGATTAGTTC 3240 
CCTAAATTGT ATATTGTAGA CAGCTGCAGA 3300 
CTTTCATCAA CTTAGGACCC ACACTTGTTG 3360 
CTGATTCCAC AACTGATTGA GAAGTTGGAG 3420 
TTTCAGGCAT CTTTGAAGGA TGAAGAAATG 3480 
TGCTTAGGAT CATGGGAATC TGTGCTGCCA 3540 
TGGTTCCAGG ATTAACATGG AAGAACCTTA 3600 
CAAAATAGAC ATGCTGCACT AACAGGAAAA 3660 
GCAAGATTAT AGTTTCAAGT AATCTAACCA 3720 
ACATACAAGA ATCTGCATTT GGCCTTCTAA 3780 
GGACAGAAAT TCAAGACTAC TATGGAACTG 3840 
ACTGCCAACT GAGCCAATAC AGCAGGCTTA 3900 
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w 



CACAGGAACC 
TTTCAGGCTC 
TGGCTCAGAA 
CTTTGGGAGG 
ACAAAGTGAG 
TGTGGTGGTG 
AGCCCAGGAA 
ACAGAGCGAG 
TGGGAGGAGG 
ATTAGCCCAA 
CACCACTCAA 
CTGGAACTGA 
GGAGGTCAGT 
TTTGTTCTGT 
TGAAATAATT 



CAGGGCCTAG 
CACTGAAAGA 
ATGAGAACTG 
CCGAAGTGGG 
ATACCCCCTG 
TATACTTACA 
TTCAAGGCTG 
ACCCTGTCTC 
TCATCATCGT 
AAAGCTTGTG 
TGGGAGAGGA 
ATATGCATCC 
ACTGCTGTTC 
TTGGTAATAT 
AGGTAATGTT 



75 



CCCTACAACA 
GTAAGCTAAG 
GTCAGGCCAG 
AGGGTCACTT 
ACCCCTTCTC 
GTCCCAGCTA 
CAGTGAGCTA 
AAAGCAAAAA 
CTTTAGCCGT 
GTCTTTGCTG 
GAGAAGTAAG 
CATGACAGGG 
AGAGATTTTT 
ACTTCAAAAC 
TTTTTCTCTA 



ATTATTGGGT 
ATTCCTGGCA 
GCATGGTGGC 
GAGGCCAGGA 
TACAAAAATA 
CTCAGGAGGC 
TGATTTCACC 
GAAAAAGAAA 
GAATGGTTAT 
GAACTCTACT 
CTGTTTGATG 
AGAATAGGAG 
TTTATGTAAC 
AAACTTCATA 
TAG 



CTATTCACTG 
CTTTCTGTCT 
TTACACCTGG 
GTTCAGGACC 
AATTTTAAAA 
TGAGGCAGGG 
ACTGCACTTC 
CTAGAACTAG 
TATAGAGGAC 
TAATCTTGAG 
TATAGGGGAA 
ATTCGGAGTT 
TCTTGAGAAG 
TATTCAAATT 



TAAGTTTTAA 
GTCTCACAGT 
AATCCCAGCA 
AGCTTAGGCA 
ATTAGCCAAA 
GGATTGCTTG 
TGGCTGGGCA 
CCTAAGTTTG 
AGAAATTGAC 
CAAATGTGGA 
AACTAGAGGC 
AAGAAGGAGA 
CAAAACTACT 
GTTCATGTCC 



3960 
4020 
4080 
4140 
4 200 
4260 
4320 
4380 
4440 
4 500 
4560 
4620 
4680 
4740 
4773. 



1 0. The genomic DNA of claim 1 , which comprises three introns with respective nucleotide sequences of SEQ ID NOs: 
10 to 12 as introns; 

20 

SEQ ID NO: 10: 

GTAAGAAATA TCATTCCTCT TTATTTGGAA AGTCAGCCAT GGCAATTAGA GGTAAATAAG 60 
CTAGAAAGCA ATTGAGAGGA ATATAAACCA TCTAGCATCA CTACGATGAG CAGTCAGTAT 120 

25 CAACATAAGA AATATAAGCA AAGTCAGAGT AGAATTTTTT TCTTTTATCA GATATGGGAG 180 
AGTATCACTT TAGAGGAGAG GTTCTCAAAC TTTTTGCTCT CATGTTCCCT TTACACTAAG 240 
CACATCACAT GTTAGCATAA GTAACATTTT TAATTAAAAA TAACTATGTA CTTTTTTAAC 300 
AACAAAAAAA AGCATAAAGA GTGACACTTT TTTATTTTTA CAAGTGTTTT AACTGGTTTA 360 
ATAGAAGCCA TATAGATCTG CTGGATTCTC ATCTGCTTTG CATTCAGACT ACTGCAATAT 420 
TGCACAGAAT GCAGCCTCTG GTAAACTCTG TTGTACACTC ATGAGAGAAT GGGTGAAAAA 480 
GACAAATTAC GTCTTAGAAT TATTAGAAAT AGCTTTCACT TTAGGAACTC CCTGAGAATT 540 
GCTGCTTTAG AGTGGTAAGA TAAATAAGCT TCTCTTTAAA CGGAATCTCA AGACAGAATC 600 
AGTTACATTA AAAGCAAACA AAAAATTTGC CCATGGTTAG TCATCTTGTG AAATCTGCCA 660 
CACCTTTGGA CTGGGCTACA ATTGGATAAT ATAGCATTCC CCGAGATAAT TTTCTCTCAC 720 
AATTAAGGAA AGGGCTGAAT AAATATCTCT GTTTGAAGTT GAATAACAAA AATTAGGACC 780 

35 CCCTAAATTT TAGGGCTCCT GAAATTCGTC TTTTTGCCTA TATTCAGCTA CTTTACGTTC 840 
TATTAAATCT TCTTTCAGGC CAGGTGCACT AGCTCATGCC TAGAATCTCA GGCAGGCCTG 900 
AGCCCAGGAA TTTGAGACCA GCCAGGGCAA CACAGTCTCT ACAAAAAAAT AAAAAATTAC 960 
CTGGGTGTGT TGGTGCATGC CTGTAGAACT ACTCAGGATG CTGAGGACTG CTTGAGCCCA 1020 
GGATAGCCAA ATCTGTGGTG AGTTCAGCCA CTAAACAGAG CGAGACTTTC TCAAAAAAAC 1080 
AAACAAAAAA ACAAACAAAC TTCCTTCAAA ATAACTTTTT ATCTGCAATG TTTTCCTATT 1140 

40 GCCTGTGAGA TTAAATTTAC TCTTTTACCT GATTTCCAAA GCCCTCCATA ATCTAATCCG 1200 
ACTTTACCTT GTGTTCACTG CAAAATAGCA GGACTGTTCC ACTACAATCC AAAAATCACA 1260 
GGTTGGGTGC AGTGGCTCAC TCCTGTAATC CCAACACTTT GGAAGGCCAA GGCAGGTGGA 1320 
TTGCTTCAGC TCAGGAGTTC AAGACCAGCC TGGGCAACAT GGCAAAAACC CTGTCTCTCC 1380 
AAAACATACA AAAATTAGCC AGATGTGGTA GTATGTGCCT GTAGTCCCAA CTACTCAAAA 1440 

4S GGCTAAGGCA AGAGGATCAC TTGAGCCCAG GAGGTCAAGG CTACAGTGAG CCATGTTTAC 1500 
TGTGTCACTG CACTCCAGCC TGGGTGATAG AGCAAGACCA TGTCTCAAAA AAAAAAAAAA 1560 
GAAAAGAAAA GAAAAAAACA TCGCTCTATT CAGTTCACCC CCACCACAAC ATTGTTTTGA 1620 
TTATCACATA AATGCTGGTC CATTGCCTTC TCTATCTATT CAAATCTTTA AGCATTCTTT 1680 
GAGATTCAAC TCAATTCTCC TTTTCAAACT AGGCCATTTA AACTACATCA GTTCCATTTT 1740 

so 
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GATTTTCTTG CTTTGAGTCT ACAGACTCAA AAACAAAAAC TTAAAAACTT ATTTTTTAAG 1800 
TTTTCTGCTA CTCTCACTTC TTCAACACTC ACATACACGC ATTCATAATA AGATGGCAGA 1860 
ATGTTCAAGG ATAAAATGAT TTATAGAACT GAAAAGTTAG GTTTTGATCT TGTTGCTGTC 1920 
AAGATGACTA CCTACCTGAT CTCAGGTAAT TAATTATGTA GCATGCTCCC TCATTTCATC 1980 
CCATACCTAT TCAACAGGAT TGGAATTCCA CAGCAAGGAT AAACATAATC ATAGTTGCTT 2040 
TTCAAGTTCA AGGCATTTTA ACTTTTAATC TAGTAGTATG TTTGTTGTTG TTGTTGTTGT 2100 
TTGAGATGGA GCCCTGCTGT GTCACCCAGG CTGGAGTGCA GTGGCACGAA CTCGGCTCAC 2160 
TGCAACCTCT GCCTCATGGG TTCAATCAGT TATTCTGCCT CAGTGTCCCA AGTAGCTGGG 2220 
ACTACAAGGC ACATGCCACC ATGCCTGGCT AATTTTTGTA TTTTTAGTAG AAACAGGGCT 2280 
TCACCATGTT GGCCAGGCTG GTCTCGAACT CCTGACCTCA AGTGATCCAG CCGCCTCGGC 2340 
CTCCCAAAGT GCTGGGATTA CAGGCATAAG CCACCGTGCC CAGCCTAATA GTATGTTTTT 2400 
AAACTCTTAG TGGCTTAACA ATGCTGGTTG TATAATAAAT ATGCCATAAA TATTTACTGT 2460 
CTTAGAATTA TGAAGAAGTG GTTACTAGGC CGTTTGCCAC ATATCAATGG TTCTCTCCTT 2520 
ACAGCTTTAA TTAGAGTCTA GAATTGCAGG TTGGTAGAGC TGGAACAGAC CTTAAAGATT 2 580 
GACTAGCCAA CTTCCTTGTC CAAATGAGGG AACTGAGACC CTTAAAATTA AGTGACTTGC 2640 
CCCAGACAAA ACTGGAACTC ATGTGTCCTA ATTTCCATCA TGAAATTCTA CCATTCACTA 2700 
GCCTCTGGCT AGTTGTCAAA GTATTGCATA ACTAAATTTT TATGTCTGTT TTAAAGAACA 2760 
AATTGTCACT GCTTACTCCT GGGAGGGTCT TTCTGAGGTG GTTTATAACT CTTAAAAAAA 2820 
AAAAAGTCAG TAGTCTGAGA ATTTTAGACG AAATAGTCAA AGCATTTTTA TCCAATGGAT 2880 
CTATAATTTT CATAGATTAG AGTTAAATCA AAGAAACACG GATGAGAAAG GAAGAGGAAA 2940 
ATTGAGGAGA GGAGGAATGG GGATGAGAAC ACACTACTTG TAATCAGTCA TAGATGTACT 3000 
GAGAACTAAC AAGAAGAATT GTAAGAAAAT AAGAATGAAG AATTCAAAAT CAACACATGA 3060 
AATAAAAAGA AACTACTAGG GAAAAATGGA GAAGACATTA GAAAAATTAT TCTATTTTTA 3120 
AAATTCTGTT TTCAGGCTTC CCTCCTGTTC TTCCTCCTTC TCATTGGTTT TCAGGTGGAG 3180 
GGAAAGTTTA AGATGGAAAA AATATATATA TTCTACACAT CCCTTTCTAC GCTGTTGTCA 3240 
TGGCAACAAG GTTTATCATA GCAAACTTTT ATTCATACAA CATTTATTGA GTTCTTACTG 3300 
TGTGGTAAGC TCTTTCCAGG TGTTGAAAAT TCAGGCGAAA AAAGACAACT CATTGTCTTA 3360 
AAACTCAGAT GAAAGCTGAA CAGACCTATT TTTAATCAAA GTAATCTCAA TTTAGGGTAG 3420 
TAAGAGCTAT TTAAGAAGCA TGAACAGGTG TGAAGGAGGT AGGACTCTGA GGAGAGAATA 3480 
GTTAGCTAGG AATGAAAGAG CAGAGAAGTT TTCCTAGAGG AACTATTAAA GCTGGGAGTT 3540 
ACGGGATGAA AGATGAGGCA GGGTTTGCAG GCAAAAAAAA AAAAAAGGCA GGGGAAGGGG 3600 
AAGTTCTGGC CTGGCAGAGA GAATAACTGT GGC AACAATG GAGGAGAGTC TGGAAGCAAG 3660 
AAAACCAAGT AGAAGAGTAT TAAAATAGAA GATGCCAGGG GTAATGAGGG CTTGATTTAA 3720 
AACAGTGCTG TTGGAGATGG AGAGGAGATA CCAAATTCTG GAGACATTTC TGAGTTAGAA 3780 
CCTACAGTAT TTATCAGACA AGGGAAAGAT TAGACAAAGG AGTTAAGAAT GACTCCCAGG 3840 
TTTCAGTTTG GGGCAGGTAA CTAGGACATG TTTTGAAAAG TAATGTATTG GATCTCTTAC 3900 
CATTGGAACT ATGTATGTGG AGCCAAATTA AAATTTGTAC ATGTATATAA CTCTCCCCCC 3960 
ACCACCAGTA ACTACTTCCC TAACTCTCTA CTTTGTAGCC AGACTTCCTA AAAGAATAGT 4020 
TTGTAGTCAC TGTCTTTACT TTTCCCCTCC CATTCTGTCC TAGATATTTG TCCACCTACC 4080 
ATCTGCTGCC TCCACTTTAC CCAAACTGTT CTACGGTTGC CCAAAACTTC CTAATTGCCA 4140 
AATTCAATGA ACAAGTTTAA GCTTATATGT AAATTAGGAG CTCTACAGTT TGATTTCGAG 4200 
CAGCCCCTCC TGAAACCCTT TCTCTTTCGA CTTCTGTGAC ACATCTCAGA TTTACAAAAC 4260 
TGAACTAATT ATTTTACACT TGAGCTGTAT TTTCGTTCTT CTTTCTTGAT GAATGAGGTA 4320 
ACCACTCAAC AAATTGCCCA AGCCAAAAAC TACGAAGTCA TCCTCAGTTC CTCCTTCTTC 4380 
TGTTTGACCC ACAACAGATC AGCTGAGAAA TCCCGCTGTT TAGTATCTCT TGAATTCATT 4440 
ACCTTAATTT ATAGCCTCAT CAACTCTTAA TTGTTAAAAT TACTTCAGTA GTTGTTGTCT 4500 
GACCTCTGTC CAATCTTGTT CAATCAGGTC CATTCTTTTG TTCTTGGTGG TGGTGGTGGT 4560 
GTTGACAGAG TTTCGCTTTT GCTGCCCAGG CTGAAGTGCA GTGGAGCACT TCACTGCAAC 4620 
CACAGCCTCC TGGGTTTAAG CAGTTCACCC TCCCGAGTAG CTGGGACTAC AGGTATGTGC 4680 
CACCACACCC AGCTAATTTT GTGTTTTCAG TAGAGACAGG GTTTCACCAT GTTGGTCAGG 4740 
CTGGTCTCAA ACTCCTGACC TCAAGCAATC CACCCACCTC AGCCTCCCAA AGTGCTGGGA 4800 
TTACAGGCAT GAGCCACTGC ACACGGACCA GATCCATTGT TTATGTTGCT TCTAGAGTGA 4860 
GTTTTTAAAA CACAAATTTG ACCATATCTT TCTCCAATTT AAGTCAGTAT TTTTTTTTTC 4920 
AGGAAAAAAC AGTTCAAACT CTTTAGTCTG CTTACACAAG GCCTTTGTAG TCTGACTCTT 4980 
CTTTCCAAGC TTTCATCAAA GTATACTGCA AGTTACATTT TATGTGAATT GAATTAGGCA 5040 
ACGGTATAAA AATTATAGTT TATATGGGCA AAATGGAAAT AATGTTAACT CTTCCAAATA 5100 
GTTTATCTAG AATGACATAA TTTCAAAGCT GTCAGGTCAA ATGAGTTATA AACTGTTAAC 5160 
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10 



15 



20 



25 



30 



35 



40 



45 



50 



ACTATTGCCA 
TTATGTAAAT 
CTTTTTAAAC 
TGTAT TGTTA 
CCAAGTTTAT 
GTGGAGACGG 
A AG C A AC A AT 
CTTTTACGTT 
GTCCCAGCCT 
CCTGACCAAA 
GGCACAGTGG 
TGAGGTCAGG 
AGCCGGGCGT 
CACTTGAACC 
TGGGCAACAA 
ACTATGTGAG 
TATGCTAAAA 
CCTTTGTTCA 
TTCTCTTTCA 
CAGAAAGAAT 
CTCTTTATAG 
TCATTCATGT 
ACCTTAACTG 
AACTATGTAC 
GGTACATTTC 
TACAGAAGAT 
AGGTTAACAA 
GATAATAGGG 
GAGCTCACAG 
GTGAAAAATG 
TAAGAGTCTT 
AAATATTATA 
TTAAATTTTA 
TTTAAATGTG 
AAGTCAGGCA 
CCTGTCTCTA 
AGATGGGAGT 
GGATGTTGCA 
TCCATCTCAA 
AGATGAGCAT 
TCATGCCTGT 
ACCAACTAAC 
TTAAATTCCT 
AGCACAGATT 
GTATATATGT 
TCTGTAATTT 
AGCACAACAG 
TGTGGTTTGT 
ATACATGTGT 
TCCATAACAA 
AAATGGAAGG 
ATTTATGTAT 
GCAGCTCAGG 
CCTGTACCCT 
GAAGTCACAA 
AGCCAGTCCT 
ATTTATATTG 



CATGCAAGTG 
TAGACTTTAA 
ATAATTTTAA 
CTGCTGTGCA 
ATGCAAGAAA 
CAGGCTACAA 
ATGCTTTGGT 
TTTTCCAAAG 
TTGGGAGGCC 
AATGGAGAAA 
CTTACCCCTG 
AGTTCGAGAC 
GGTGGTGCAT 
CAGGCAGTGG 
GAGCGAAACT 
ATCTTTAGAA 
ATGTTGAGGC 
CATTGCTTCT 
AGGTAGCACA 
GAATCTCTTG 
ATTCTTCTCT 
GTGGTTTGGT 
GTGAATCAGC 
TCAAAATAGT 
AGGTGAATTA 
CATGCTAATC 
ATGTCTGCCT 
CCTTGTTCAG 
CCTAGTCATT 
TTTAGACTTA 
CCTTATCCCC 
AATAGCTTAT 
TAATATTTAA 
TTGGCCAGGC 
AACCATTTGA 
CCA A AC AT AT 
CCCAGGCTAA 
GTGAGCTGAG 
AAAAAAAAAA 
AAGGAACTAA 
AATCTCAAGC 
AACTATTTTG 
GCTTACTCTA 
TATGAATACT 
GCCTTTTTAC 
CAAAGTAGTA 
AAAATAGGTA 
TTCCTGCAAC 
ATTTTTTTCC 
AATTCTGGAA 
ATTTAAATTT 
TACTTGTGTT 
GAGAAAGAGG 
CAGATGGAAG 
ACCTTCAGCC 
TCCTAAAGGG 
ACAATATATT 



TCTCTTATAC 
ATAACTCAGA 
ATAATTTTAT 
AAAAAAAAAA 
ACAATATTAA 
AACACTAGTA 
TGTTGTAGGT 
AAAAGTTAAC 
AAGGCGGGCA 
CCCGCCCCCC 
TGATCCCAGC 
CAGCCATGGA 
GACTGTAATC 
AGGTTGCAGT 
CTGTATCCAA 
ATGCATTCTT 
CTCAAACAAC 
TCTTGGTGGA 
GTCATCACTT 
TAAATTCAGC 
CCCACTAGAC 
CAGCTACTGG 
TCAAATATTG 
AAACACCAGT 
GTTTGATAAA 
ATAATTCACA 
TCTCAGTTAA 
CAATGAAGCC 
ATTGTTCTGA 
TCTTAGTGAT 
TCCTTTAAAA 
AAAATTTAAT 
ATGTTTATTT 
ATGGTGGCTG 
GCTCAGGAGT 
GAAAACTTAT 
GATGGGAGAA 
ATCGTGCCAC 
TGTTATCTAA 
ACCTAGATAA 
ACTTTGGGAG 
GGGTTAGCTT 
CCATACATGC 
CTCCTGCATA 
CAACACATTA 
ATGAGTTTAA 
TTTCTATTAA 
TAATCATGGG 
CCATCCAAGT 
GCCTGCACAC 
TTTTCAAATC 
AAAAATATAG 
GCCACCAAAT 
CACTTTTTTA 
TCTCACCCAT 
CACGATTAGA 
TATATTTCTC 



TTGGTAGAAT 
AGTTCTTCA3 
ATATGATAAT 
AAAAAAAAAC 
AAAGGCCAAA 
ATAATGCTGA 
TTATGTACTC 
TTTGGCTGGG 
GATCACCTGA 
TCACTACTAA 
ACTTTGGGAG 
GAAACCCGTC 
CCAGCTACTC 
GAGCCGAGAT 
AAAACAAAAG 
TCTGTAAAAT 
CCATATTTTC 
AGCTCTTCCT 
TCTGTGGTAA 
TCTTACGTCA 
TCTGAGTCAC 
CCACATTCCT 
TCCTTCTCTA 
AATTTAATCC 
TATGTGTGTA 
ACTGATAACT 
TGCAGTCATT 
ATAAAGGTGA 
TTTTTAATAT 
CTTTTCATCC 
AACTAGGTGA 
ATTTATAATA 
AAATTCATTT 
ACACCTATAA 
TTGAGACCAC 
CTGGGTGTGG 
TCGCTTGAAC 
TGCACTCCAA 
ATAAGATAAA 
AACTATCAAA 
GCCAAAATTA 
AATTCAGATT 
TAGGCCTCAT 
CCATTTAATT 
AATAATAAGA 
ACCATGTCTC 
TGACAGAGTC 
AGGAATGCTA 
TCACAGATTT 
CGTATTGGAA 
CTGTATCCCT 
TATAATCGAG 
GCCAGAGCCC 
TCATTGTTTC 
AGGAACCCAC 
TGTGTTTATG 
AAGGAATACT 



TATCTGCTTC 
ACATACAGGT 
GTTATCCAAG 
TCCAAATAAA 
GTACCACCAT 
GAAAGTTGAA 
CAAGAATATC 
CGCAGTGGCT 
GGTCAGGAGT 
AAGAATACAA 
GCCGAAGCAG 
TCTACTAAAA 
AGGAGGCTAA 
CGTGCCATTG 
AAAAGAAAAG 
GTGACTACAT 
TCGGTCTCCC 
CTGGCCTTGA 
CCTTCTCCAG 
TTCATTACAT 
TGGAGAGTAG 
GATGCATAGT 
AATCCATTCA 
AATTCCTGCC 
TTACATAATA 
AATCAAACAT 
AACAAACACC 
ATA A AG A AC A 
TAATGTTGGT 
TTTGCTATAT 
TAATTCTAAA 
TTTAAATGTT 
GTACATCAGT 
TCCCACAACT 
CCTGGGCAAC 
TGGCACGCAT 
CCAGGTGAGA 
CCTGGGTGAC 
TTTAATAACT 
TAAGGCCTGG 
TACAAAGTTA 
AATTTTTTTT 
ATTATGCTAG 
TTTAAACAAA 
TCTACTGTGA 
AAGATCTCTG 
ACAAGTACTA 
AATTTCAGAG 
CTCACACTGA 
GAAGGGCAGA 
TGATTTTACA 
ACTCCAGATC 
TTCAGCCTTC 
ACCTTTAGCA 
TGGTTGTAAG 
GCATCCTCAG 
AGAATAATGA 



CATGTCATTA 
TATTATTGTG 
TGCTAAGGGA 
TATGTTGAAA 
AATAGGCTGT 
AAAAGAAAGA 
TCCTCTCAAA 
CTTGCCTGTA 
TTGAGACCAG 
AATTAGGCCG 
GAAGATCACC 
ATACAAAATT 
GGCAGAGAA? 
CACTCCAGCC 
GTAACCTTGA 
TTGCCTTATT 
CCCTGCCTAG 
AAATGCCTGC 
CACCATCAAA 
TATTTTGTAA 
GAGCCAACTC 
TAATGCTCAA 
CTCATTGACT 
CATACTGCTT 
TTAAAGTATG 
AAATGCTCTC 
TTCTGATGCT 
TGCCCTCGTG 
TTGGGTTTTG 
TATTTTTCTC 
TTGTAAATTT 
TGATAAATAT 
TTTTATTTTA 
TTGAGAGGCC 
GTGGTGAAAC 
CTGTGGTCCC 
GGGGTGGGGT 
AGAGTGAGAC 
GTTCGCACTT 
GTACAGTGAC 
GTTGTATAAC 
AAACTGAGTT 
AAAAATTTTG 
TTTTAATGCA 
GGACTAAATT 
CAATAACTGT 
CTAATAATAC 
GTTGGTGAAA 
GAACTCCTAT 
AAGGAAAAGC 
GCAAGATTGT 
AAAAATCACC 
TCCCACCCTG 
TTTTGACAAT 
AGAAGGATGA 
GTCAAACTAT 
TTCAGTTCAG 



5220 
5230 
5340 
5400 
5460 
5520 
5580 
5640 
5700 
5760 
5820 
5380 
5940 
6000 
6060 
6120 
6180 
6240 
6300 
6360 
6420 
6480 
6540 
6600 
6660 
6720 
6780 
6840 
6900 
6960 
7020 
7030 
7140 
7200 
7260 
7320 
7380 
7440 
7500 
7560 
7620 
7680 
7740 
7800 
7860 
7920 
7980 
8040 
8100 
8160 
8220 
8280 
8340 
8400 
8460 
8520 
8580 
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TACTAGGCCA TTTATCTACC CTTTATAATA 
CAAATATCTG ATGATTTGTA AGAGAACACT 
TGGCATTTAT TGAATGTCAA GATTGTTCAT 
ACTTGAAGGT AGTATAAAGT AGTAGTAAAA 
CTTTGCTTTC ATTAG 



TTGTTTAATG AGAAAATGCT TTCTATCTTC 8640 
TAAACATGGG TATTCATAAG CTGAAACTTC 8700 
CAGTATACTA GGTGATTAAC TGACCACTGA 8760 
GGTACAATCA TTGTCTCTTA ACAGATGGCT S820 

8835 



SEQ ID NO: 11: 



10 



15 



20 



25 



30 



GTAAGGCTAA 
TGTTGTGAGG 
GGCTCTTAAA 
CCTTGTCACA 
TAAGTGCACA 
CATGGGAAAA 
ATTGGCAAAT 
GTAAATATGT 
AGTGCAGTGG 
CTGCCTCAGC 
TTGGGTATTT 
GATCTCAGAT 
CCACACATGG 
TTGAAAGTTT 
CATCTCTACA 
AGCAGGAGGA 
ACACTCCAGC 
AAAATTATTA 
AATCTCTAAT 
TAAGTTTTAT 
CCTTTTGGA7 

ATAGGAGTTC 
TCGAAATACA 



TGCCATAGAA 
TGAACTATTA 
AAAATAGTGG 
CCACGTGTCC 
TCATGAAAAT 
ATCCCAGTAC 
TATGTAAGAG 
TTGACAAGTA 
CACAATCTCT 
CTTCTGAGTA 
TTACTAGAGA 
GATCCTCCTG 
CCTAAAAATT 
GCCTTCATTT 
AAAAACTGCA 
TCACTTGAGC 
CTGCATGACA 
GTTGACTTTT 
TTTAGAAAAT 
GTCTAAATTA 
GATTATATAA 
GAGAAAGAGG 
AAATTTTACA 



CAAATACCAG 
AGTGACTCTT 
ACCTCTAGAA 
TGGCACTTTA 
CCCAGTTTTC 
AAAACTGGGT 
ATTCTCTAAA 
AAAATTGATT 
GCTCACTGCA 
GCTGGGACTA 
CAGGGTTTTG 
GCTCGGGCTC 
GATTCTTATG 
GAAACCTTCA 
AAATATCCTG 
CTAGGAATTT 
GTAGACCCTG 
CTTAGGTGAC 
TTATTTTTAG 
CCTGAGAACA 
TATTCTGATG 
AGTAGCAAAA 
TATTCTGTTT 



GTTCAGATAA 
TGTGTCACCA 
ATTAACCACA 
ATCAGCAGTA 
ATGGGAAAAT 
GCATTCAGGA 
TTTAGAGTTC 
C T TT TT ^ TT T 
ACCTCCACCT 
CAGGTGCATC 
GCATGTTGTC 
CCAAAGTGCT 
ATTAATCTCC 
TTTAAAAGCC 
TGGACACCTC 
GAGCCTGCAG 
ACACACACAC 
TTTCCGTTTA 
TTACATATTG 
CACTAAGTCT 
AAAGCCAAGA 
GTAAAAGCTA 
CTCTCTTTTT 



ATCTATTCAA 
AATTTCACTG 
ACATGTCCAA 
GCTCACTCTC 
CCCAGTTTTC 
AATACAATTT 
CGTGAATTAC 
TTTTCTGTTG 
CCTGGGTTCA 
CCGCCATGCC 
CAGGCTGGTC 
GGGATTACAG 
TGTGAACAAT 
TGAGCAACAA 
CTACCTTCTG 
TGAGCTATGA 
ACAAAAAAAA 
AGCAATAAAT 
AAATTTTTAA 
GATAAGCTTC 
CAGACCCTTA 
GAATGAGATT 
CCCCCTCTTA 



TTAGAAAAGA 
TAATATTAAT 
GGTCTCAGCA 
CAGTTGGCAG 
ATTGGATTTC 
CCCAAAGCAA 
ACCATTTTAT 
CCCAGGCTGG 
AGCAATTCTC 
TGGCTAATTT 
TTGGACTCCT 
GCATGAACCA 
TTGGCTTCAT 
AGTGAGACCC 
TGGAGGCTGA 
TCCCACCCCT 
ACCTTCATAA 
TTAAAAGTAA 
ACCCTAGGTT 
ATTTTATGGG 
AACCATAAAA 
GAATTCTGAG 
G 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1371 



SEQ ID NO: 12: 

35 GTAAAGTAGA AATGAATTTA TTTTTCTTTG 

TCTCACCATT GTCAGCTGAG GAAAAAAAAA 
AAGAAATGTG GACTCAGTAG CACAGCTTTG 
AGAACCTCTA GCAAAAGATG CTTCTCTATG 
ACAAAATAGA CTTTGCCTGT TTCATTGGTC 
TGTAGGGGGA GCGTTGCATA GGAAAAAGGG 

40 AACACCTCCT CTCAGAAATG CTTTGGGAAG 

GTGGGGCAGA AAATTCTGGA AGTAGAGGAG 
TCAGAGGCCA AAAGCTGAAA GAAACCATGG 
TGGAAGTAGA GTAGGAGTAG GAGACTGGTG 
AGCAAGACGT TCTCTCACCC CAAGATGTGA 
TTAATTAAGC ACAATATGTA TTAGCTAGGG 

45 CAAAGATACA GTAGCTGAAT AAGATAGAGA 

AGCTCAGAAG TAGTATGGCT GGAAGCAACC 
GTCTTGTACC CATCATCCCC TAGTTGTTGA 
TTCCTGGGTT CATATCCCAG TTATCAAGAA 
CAAAGACTCT AATTGGAAGT TAAACACATC 

$0 TAATCACATG GCCACACCAA GTGCAAGGAA 

GCCATATGAC TCTTTAAAAT TCAGAAATAA 
TATAAAGAAT TGATGGTGTG GGGTGAGGAG 
TTAGTTATTA CAAGAAATGA TGGTGTCATG 



CAAACTAAGT ATCTGCTTGA GACACATCTA 60 

AATCGTTCTC ATGCTACCAA TCTGCCTTCA 120 

GAATGAAGAT GATCATAAGA GATACAAAGA 180 

CCTTAAAAAA TTCTCCAGCT CTTAGAATCT 240 

CTAAGATTAG CATGAAGCCA TGGATTCTGT 300 

ATTGAAGCAT TAGAATTGTC CAAAATCAGT 360 

AAGCCTGGAA GGTTCCGGGT TGGTGGTGGG 420 

ATAGGAATGG GTGGGGCAAG AAGACCACAT 480 

CATTTATGAT GAATTCAGGG TAATTCAGAA 540 

AGAGGAGCTA GAGTGATAAA CAGGGTGTAG 600 

AATTTGGACT TTATCTTGGA GATAATAGGG 660 

TAAAGATTAG TTTGTTGTAA CAAAGACATC 720 

ATTTTTCTCT CAAAGAAAGT CTAAGTAGGC 780 

TGATGATATT GGGACCCCCA ACCTTCTTCA 840 

TCTCACTCAC ATAGTTGAAA ATCATCATAC 900 

AGGGTCAAGA GAAGTCAGGC TCATTCCTTT 960 

AATCCCCCTC ATATTCCATT GACTAGAATT 1020 

ATCTGGAAAA TATAATCTTT ATTCCAGGTA 1080 

TATATTTTTA AAATATCATT CTGGCTTTGG 1140 

GCCAAAATTA AGGGTTGAGA GCCTATTATT 1200 

AATTAAGGTA GACATAGGGG AGTCCTGATG 1260 
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AGGAGCTGTG AATGGATTTT AGAAACACTT 
GGATTTGGAA AGGAGAAGAA AGTAGAAAAG 
5 TGTACCATTC AGTGAAATAG GGAACACAGG 

GGAGGATGGA TGACGCATTT CGTTTTGGAT 
ATCTCCACAA ACTCTTCTAC ATGTGGTTCT 
AGAGATATTG TAGGCTTATA CATAGAAATG 
ATCAGAGGAA ATGTGTAAAG TGAGAGAGGA 
TACATTTAAA GGATGCAGTA GAAAGAAGCT 
GGGGAGAAGA AAAACCAAGA GAATTCCACC 
GGATAGGTGT TGTGTTGAAT TTTGCAGCCT 
GATTTAGCAA CAAGGAGTTT GGTGATCTCA 
AGAGGCAGAT TGCAATGAGT GAAACAGTGA 
CTTGCTAAAA GCTTGGCTGT TAAAAGGAGG 

15 TGGGTTGATG GAGCAGTTTT AAATCTCAAA 

AAATAATGTG TTAATTGTAA CTAATTGAGG 
AAAATATAAA AACCACCCAG AAATAATGAT 
TCCTTTCTAT GTATATATAC AGACACAGAA 
CTATACCTAA GCTGCTTTTT CTAGTTAGTG 
GTAATTGCAG TTATATTAAG TTCATGATAT 

20 TATTTAATCA ATTCTTAATT GGTGAATGTT 

GTTCCCATAA GCATTCCTGT ACACCAATGT 
TAAAACCCAC GAGGTAGAAT TGCTGGGTTG 
AGCTTCAGTA GAGGGTACAT GCCGAGCACA 
CACTTTCTCA TTTCCCCTTG GGACAAAAGG 
AAATTTGTAC GTGGACTAGC AGGAAATCAT 

25 TAGAGCTCCT GAAGAGAGCA AAGAAAATTT 

TGCTGAACTG GAAAACAAAA GAAGTATTGA 
GAACGCTTGT GCCATTGTTC ACCAGCAGCA 
GAAAGACAAA TAAGTTAGGG ATTTAATATC 
AGATCCAGCT GCACAGGGAA GGAAGGGAAG 
GTCAGGAGAC TGGAAGATGT TGTGATATTT 

50 AGAAAACTAG AAGGGTAAGA GACCGGTCAG 

AGGCAGAGTA GTTCTGAATG GTAACAAGAA 
AACAATAGGC AACTTTATTG TAGCTACTTC 
GAAAACTAAA ATATATAGCA TACTTATTTG 
ATGAGATTTA ATGTTTATTG TAG 



GAGAGAATCA ATAGGACATG ATTTAGGGTT 1320 
ATGATGCCTA CATTTTTCAC TTAGGCAATT 1380 
AGGAAGAGCA GGTTTTGGTG TATACAAAGA 1440 
CTGAGATGTC TGTGGAACGT CCTAGTGGAG 1500 
GAGTTCAGGA CACAGATTTG GGCTGGAGAT 1560 
GCATTTGAAT CT AT AG AG AT AAAAAGACAC 1620 
AAAGCCAAGT ACTG7CCTGG GGGGAATACC 1680 
A AT A A AC A AC AGAGAGCAGA CTAACCAAAA 1740 
GACTCCCAGG AGAGCATTTC AAGATTGAGG 1800 
TGAGAATCAA GGGCCAGAAC ACAGCTTTTA i860 
GTGAAAGCAG CTTGATGGTG AAATGGAGGC 19 20 
ATGGGAAGTG AAGAAATGAT ACAGATAATT 1960 
AGAGAAACAA GACTAGCTGC AAAGTGAGAT 2040 
ATAAAGAGCT TTGTGCTTTT TTGATTATGA 2100 
CAATGAAAAA AGATAATAAT ATGAAAGATA 2160 
AGCTACCATT TTGATACAAT ATTTCTACAC 2220 
ATGCTTATAT TTTTATTAAA AGGGATTGTA 2 2 80 
ATATATATGG ACATCTCTCC ATGGCAACGA 2340 
TTCACAATAA GGGCATATCT TTGCCCTTTT 2400 
TGTTTCCAGT TTGTTGTTGT TATTAACAAT 2 4 60 
TCACACATTT GTCTGATTTT TTCTTCAGCA 2520 
ATAGAAGAGA AAGGATGATT GCCAAATTAA 2530 
AATGGGATCA GCCCTAGATA CCAGAAATGG 2640 
GAGAGAGGCA ATAACTGTGC TGCCAGAGTT 2700 
TTGCTGAAAA TGAAAACAGA GATGATGTTG 2760 
GAAATTGCGG CTATCAGCTA TGGAAGAGAG 2820 
CAATTGGTAT GCTTGTAATG GCACCGATTT 2880 
CTCAGCAGCC AAGTTTGGAG TTTTGTAGCA 2940 
CTGGCCAAAT GGTAGACAAA ATGAACTCTG 3G00 
ACGGGAAGAG GTTAGATAGG AAATACAAGA 3060 
AAGAACACAT AGAGTTGGAG TAAAAGTGTA 3120 
AAAGTAGGCT ATTTGAAGTT AACACTTCAG 3180 
ATTGAGTGTG CCTTTGAGAG TAGGTTAAAA 3240 
TGGAACAGAA GATTGTCATT AATAGTTTTA 3300 
TCAATTAACA AAGAAACTAT GTATTTTTAA 3360 

3383. 



11. The genomic DNAof claim 9, which comprises additional three introns with respective nucleotide sequences given 
inSEQIDNOs:10to12. 

40 12. The genomic DNA of claim 1 , which has a nucleotide sequence selected from the group consisting of SEQ ID NO: 
13 and its complementary sequence; 



SEQ ID NO: 13: 

AAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC AAC TTT GTG GCA 4 8 

Met Ala Ala Glu Pro Val Glu Asp Asn Cys lie Asn Phe Val Ala 
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10 



15 



20 



25 



30 



35 



40 



45 



$0 



-35 .30 
ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA 
Met Lys Phe He Asp Asn Thr Leu Tyr Phe He 

-20 -15 
AGAACAAATA CCAGGTTCAG ATAAATCTAT TCAATTAGAA 
ATTAAGTGAC TCTTTGTGTC ACCAAATTTC ACTGTAATAT 
GTGGACCTCT AGAAATTAAC CACAACATGT CCAAGGTCTC 
GTCCTGGCAC TTTAATCAGC AGTAGCTCAC TCTCCAGTTG 
AAATCCCAGT TTTCATGGGA AAATCCCAGT TTTCATTGGA 
GTACAAAACT GGGTGCATTC AGGAAATACA ATTTCCCAAA 
AGAGATTCTC TAAATTTAGA GTTCCGTGAA TTACACCATT 
AGTAAAAATT GATTCTTTTT TTTTTTTTCT GTTGCCCAGG 
CTCTGCTCAC TGCAACCTCC ACCTCCTGGG TTCAAGCAAT 
AGTAGCTGGG ACTACAGGTG CATCCCGCCA TGCCTGGCTA 
GAGACAGGGT TTTGGCATGT TGTCCAGGCT GGTCTTGGAC 
CCTGGCTCGG GCTCCCAAAG TGCTGGGATT ACAGGCATGA 
AATTGATTCT TATGATTAAT CTCCTGTGAA CAATTTGGCT 
ATTTGAAACC TTCATTTAAA AGCCTGAGCA ACAAAGTGAG 
TGCAAAATAT CCTGTGGACA CCTCCTACCT TCTGTGGAGG 
GAGCCTAGGA ATTTGAGCCT GCAGTGAGCT ATGATCCCAC 
GACAGTAGAC CCTGACACAC ACACACAAAA AAAAACCTTC 
TTTTCTTAGG TGACTTTCCG TTTAAGCAAT AAATTTAAAA 
AAATTTATTT TTAGTTACAT ATTGAAATTT TTAAACCCTA 
ATTACCTGAG AACACACTAA GTCTGATAAG CTTCATTTTA 
ATAATATTCT GATGAAAGCC AAGACAGACC CTTAAACCAT 
GAGGAGTAGC AAAAGTAAAA GCTAGAATGA GATTGAATTC 
TACATATTCT GTTTCTCTCT TTTTCCCCCT CTTAG CT 

Ala 
-10 

GTAGAAATGA ATTTATTTTT CTTTGCAAAC TAAGTATCTG CTTGAGACAC 
CCATTGTCAG CTGAGGAAAA AAAAAAATGG TTCTCATGCT ACCAATCTGC 
ATGTGGACTC AGTAGCACAG CTTTGGAATG AAGATGATCA TAAGAGATAC 
CTCTAGCAAA AGATGCTTCT CTATGCCTTA AAAAATTCTC CAGCTCTTAG 
ATAGACTTTG CCTGTTTCAT TGGTCCTAAG ATTAGCATGA AGCCATGGAT 
GGGGAGCGTT GCATAGGAAA AAGGGATTGA AGCATTAGAA TTGTCCAAAA 
CTCCTCTCAG AAATGCTTTG GGAAGAAGCC TGGAAGGTTC CGGGTTGGTG 
GCAGAAAATT CTGGAAGTAG AGGAGATAGG AATGGGTGGG GCAAGAAGAC 
GGCCAAAAGC TGAAAGAAAC CATGGCATTT ATGATGAATT CAGGGTAATT 
GTAGAGTAGG AGTAGGAGAC TGGTGAGAGG AGCTAGAGTG ATAAACAGGG 
GACGTTCTCT CACCCCAAGA TGTGAAATTT GGACTTTATC TTGGAGATAA 
TAAGCACAAT ATGTATTAGC TAGGGTAAAG ATTAGTTTGT TGTAACAAAG 
ATACAGTAGC TGAATAAGAT AGAGAATTTT TCTCTCAAAG AAAGTCTAAG 
AGAAGTAGTA TGGCTGGAAG CAACCTGATG ATATTGGGAC CCCCAACCTT 
GTACCCATCA TCCCCTAGTT GTTGATCTCA CTCACATAGT TGAAAATCAT 
GGGTTCATAT CCCAGTTATC AAGAAAGGGT CAAGAGAAGT CAGGCTCATT 
ACTCTAATTG GAAGTTAAAC ACATCAATCC CCCTCATATT CCATTGACTA 
ACATGGCCAC ACCAAGTGCA AGGAAATCTG GAAAATATAA TCTTTATTCC 
ATGACTCTTT AAAATTCAGA AATAATATAT TTTTAAAATA TCATTCTGGC 
AGAATTGATG GTGTGGGGTG AGGAGGCCAA AATTAAGGGT TGAGAGCCTA 
TATTACAAGA AATGATGGTG TCATGAATTA AGGTAGACAT AGGGGAGTGC 
CTGTGAATGG ATTTTAGAAA CACTTGAGAG AATCAATAGG ACATGATTTA 
TGGAAAGGAG AAGAAAGTAG AAAAGATGAT GCCTACATTT TTCACTTAGG 
CATTCAGTGA AATAGGGAAC ACAGGAGGAA GAGCAGGTTT TGGTGTATAC 
ATGGATGACG CATTTCGTTT TGGATCTGAG ATGTCTGTGG AACGTCCTAG 
CACAAACTCT TCTACATGTG GTTCTGAGTT CAGGACACAG ATTTGGGCTG 
TATTGTAGGC TTATACATAG AAATGGCATT TGAATCTATA GAGATAAAAA 
AGGAAATGTG TAAAGTGAGA GAGGAAAAGC CAAGTACTGT GCTGGGGGGA 



-25 

G GTAAGG CTAATGCCAT 
Ala 
-10 

AAGATGTTGT GAGGTGAACT 
TAATGGCTCT TAAAAAAATA 
AGCACCTTGT CACACCACCT 
GCAGTAAGTG CACATCATGA 
TTTCCATGGG AAAAATCCCA 
GCAAATTGGC AAATTATGTA 
TTATGTAAAT ATGTTTGACA 
CTGGAGTGCA GTGGCACAAT 
TCTCCTGCCT CAGCCTTCTG 
ATTTTTGGGT ATTTTTACTA 
TCCTGATCTC AGATGATCCT 
ACCACCACAC ATGGCCTAAA 
TCATTTGAAA GTTTGCCTTC 
ACCCCATCTC TACAAAAAAC 
CTGAAGCAGG AGGATCACTT 
CCCTACACTC CAGCCTGCAT 
ATAAAAAATT ATTAGTTGAC 
GTAAAATCTC TAATTTTAGA 
GGTTTAAGTT TTATGTCTAA 
TGGGCCTTTT GGATGATTAT 
AAAAATAGGA GTTCGAGAAA 
TGAGTCGAAA TACAAAATTT 
GAA GAT GAT G GTAAA 
Glu Asp Asp Glu 



ATCTATCTCA 

CTTCAAAGAA 

AAAGAAGAAC 

AATCTACAAA 

TCTGTTGTAG 

TCAGTAACAC 

GTGGGGTGGG 

CACATTCAGA 

CAGAATGGAA 

TGTAGAGCAA 

TAGGGTTAAT 

ACATCCAAAG 

TAGGCAGCTC 

CTTCAGTCTT 

CATACTTCCT 

CCTTTCAAAG 

GAATTTAATC 

AGGTAGCCAT 

TTTGGTATAA 

TTATTTTAGT 

TGATGAGGAG 

GGGTTGGATT 

CAATTTGTAC 

AAAGAGGAGG 

TGGAGATGTC 

GAGATAGAGA 

GACACATCAG 

ATACCTACAT 



98 



158 
218 
278 
338 
398 
458 
518 
578 
638 
698 
758 
818 
878 
938 
998 
1058 
1118 
1178 
1238 
1296 
1358 
1418 
1 470 



1530 

1590 

1650 

1710 

1770 

1830 

1890 

1950 

2010 

2070 

2130 

2190 

2250 

2310 

2370 

2430 

2490 

2550 

2610 

2670 

2730 

2790 

2850 

2910 

2970 

3030 

3090 

3150 
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10 



TTAAAGGATG CAGTAGAAAG AAGCTAATAA ACAACAGAGA 
GAAGAAAAAC CAAGAGAATT CCACCGACTC CCAGGAGAGC 
GGTGTTGTGT TGAATTTTGC AGCCTTGAGA ATCAAGGGCC 
AGCAACAAGG AGTTTGGTGA TCTCAGTGAA AGCAGCTTGA 
CAGATTGCAA TGAGTGAAAC AGTGAATGGG AAGTGAAGAA 
TAAAAGCTTG GCTGTTAAAA GGAGGAGAGA AACAAGACTA 
TGATGGAGCA GTTTTAAATC TCAAAATAAA GAGCTTTGTG 
ATGTGTTAAT TGTAACTAAT TGAGGCAATG AAAAAAGATA 
ATAAAAACCA CCCAGAAATA ATGATAGCTA CCATTTTGAT 
TCTATGTATA TATACAGACA CAGAAATGCT TATATTTTTA 
CCTAAGCTGC TTTTTCTAGT TAGTGATATA TATGGACATC 
TGCAGTTATA TTAAGTTCAT GATATTTCAC AATAAGGGCA 
AATCAATTCT TAATTGGTGA ATGTTTGTTT CCAGTTTGTT 
CATAAGCATT CCTGTACACC AATGTTCACA CATTTGTCTG 
CCCAGGAGGT AGAATTGCTG GGTTGATAGA AGAGAAAGGA 
CAGTAGAGGG TACATGCCGA GCACAAATGG GATCAGCCCT 
TCTCATTTCC CCTTGGGACA AAAGGGAGAG AGGCAATAAC 
TGTACGTGGA GTAGCAGGAA ATCATTTGCT GAAAATGAAA 
GTCCTGAAGA GAGCAAAGAA AATTTGAAAT TGCGGCTATC 
AACTGGAAAA CAAAAGAAGT ATTGACAATT GGTATGCTTG 
CTTGTGCCAT TGTTCACCAG CAGCACTCAG CAGCCAAGTT 
ACAAATAAGT TAGGGATTTA ATATCCTGGC CAAATGGTAG 
CAGCTGCACA GGGAAGGAAG GGAAGACGGG AAGAGGTTAG 
GAGACTGGAA GATGTTGTGA TATTTAAGAA CACATAGAGT 
ACTAGAAGGG TAAGAGACCG GTCAGAAAGT AGGCTATTTG 
GAGTAGTTCT GAATGGTAAC AAGAAATTGA GTGTGCCTTT 
25 TAGGCAACTT TATTGTAGCT ACTTCTGGAA CAGAAGATTG 



75 



20 



30 



35 



40 



45 



CTAAAATATA i AUCATACTT 
ATTTAATGTT TATTGTAG 



ATTTUTCAAT TAACAAAGAA 
AA AAC CTG GAA TCA GAT 
Glu Asn Leu Glu Ser Asp 
-5 

GAA TCT AAA TTA TCA GTC ATA AGA AAT TTG AAT 
Glu Ser Lys Leu Ser Val He Arg Asn Leu Asn 



10 



15 



ATT GAC CAA GGA AAT CGG CCT CTA TTT GAA GAT 
He Asp Gin Gly Asn Arg Pro Leu Phe Glu Asp 



25 



30 



50 



TGT AGA G GTATTTTTT TTAATTCGCA AACATAGAAA 
Cys Arg Asp 
40 

TTCTGTTTTA CTGCTTACAT TGTTCCGTGC TAGTCCCAAT 
GAGTGACAAT AATTTCACTT ACAGGAAACT TTATAAGGCA 
TAAAAAATTG GAT AC AAT AA GACATTGCTA GGGGTCATGC 
ATCACCAATC CCTTTATTGT GATTGCATTA ACTGTTTAAA 
AATCCCTGCT TGTTACAGCT GAAAATGCTG ATAGTTTACC 
GTAATCCTAG CTACTTGGGA GGCTCAAGCA GGAGGATTGC 
CTGTAGTACA CTGTGATCGT ACCTGTGAAT AGCCACTGCA 
AGACCTTGTC TCTAAAATTA AAAAAAAAAA AAAAAAAAAC 
AAGTCTACTG TGCCTTCCAA AACATGAATT CCAAATATCA 
AGTGAATGTG CATTCTTTAA AAATACTGAA TACTTACCTT 
TATTTAGCAT TTAAAAGTTA AAAACAATCT TTTAGAATTC 
AAAGTTGCAG CGTGTGTGTT GTAATACACA TTAAACTGTG 
ATGCAGTTTC ACTCTGTCAC CCAGGCTGAA GTGCAGTGCA 
CTCACTACAA CCTCCACCTC CCACGTTCAA GCGATTCTCA 
GTGGGATTAC AGGCATGCAC CACTTACACC CGGCTAATTT 
GGGTTTCACC ATGTTGGCCA GGCTGGTCTC AAACCCCTAA 
TCAGCCTCCC AAACAAACAA ACAACCCCAC AGTTTAATAT 



GCAGACTAAC CAAAAGGGGA 
ATTTCAAGAT TGAGGGGATA 
AGAACACAGC TTTTAGATTT 
TGGTGAAATG GAGGCAGAGG 
ATGATACAGA TAATTCTTGC 
GCTGCAAAGT GAGATTGGGT 
CTTTTTTGAT TATGAAAATA 
ATAATATGAA AGATAAAAAT 
ACAATATTTC TACACTCCTT 
TTAAAAGGGA TTGTACTATA 
TCTCCATGGC AACGAGTAAT 
TATCTTTGCC CTTTTTATTT 
GTTGTTATTA ACAATGTTCC 
ATTTTTTCTT CAGGATAAAA 
TGATTCCCAA ATTAAAGCTT 
AGATACCAGA AATGGCACTT 
TGTGCTGCCA GAGTTAAATT 
ACAGAGATGA TGTTGTAGAG 
AGCTATGGAA GAGAGTGCTG 
TAATGGCACC GATTTGAACG 
TGGAGTTTTG TAGCAGAAAG 
ACAAAATGAA CTCTGAGATC 
ATAGGAAATA CAAGAGTCAG 
TGGAGTAAAA GTGTAAGAAA 
AAGTTAACAC TTCAGAGGCA 
GAGAGTAGGT TAAAAAACAA 
TCATTAATAG TTTTAGAAAA 
ACTATGTATT TTTAAATGAG 
TAC TTT GGC AAG CTT 
Tyr Phe Gly Lys Leu 
1 5 
GAC CAA GTT CTC TTC 
Asp Gin Val Leu Phe 
20 

ATG ACT GAT TCT GAC 
Met Thr Asp Ser Asp 
35 

TGACTAGCTA CTTCTTCCCA 



CCTCAGATGA 
TCCACGTTTT 
CTCTCTGAGC 
ACCTCTATAG 
AGGTGTGGTG 
TTGAGGCCAG 
CTCCAGCCTG 
CTTAGGAAAG 
AAGTTAGGCT 
AACATATATT 
ATATCTTTAA 
GGGTTGTTTG 
GTGCAGTGGT 
TGCCTCAGTC 
TTGTATTTTT 
CCTCAAGTGA 
GTGTTACAAC 



AAAGTCACAG 
TTAGTTGGGG 
CTGCCTTTGA 
TTGGATGCTT 
GCATCTATCT 
GACTTTGAGG 
GGTGATATAC 
GAAATTGATC 
GAGTTGAAGC 
TTAAATATTT 
AATACTCAAA 
TTTGTTTGAG 
GTGATCTCGG 
TCCCGAGTAG 
AGTAGAGCTG 
TCTGCCTGCC 
ACACATGCTG 



3210 

3270 

3330 

3390 

3450 

3510 

3570 

3630 

3690 

3750 

3810 

3870 

3930 

3990 

4050 

4110 

4170 

4230 

4290 

4350 

4410 

4470 

4530 

4590 

4650 

4710 

4770 

4830 

4880 



4928 



4976 



5032 



5092 
5152 
5212 
5272 
5332 
5392 
5452 
5512 
5572 
5632 
5692 
5752 
5812 
5872 
5932 
5992 
6052 
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CAACTTTTAT GAGTATTTTA ATGATATAGA TTATAAAAGG TTGTTTTTAA CTTTTAAATC 6112 

CTGGGATTAC AGGCATGAGC CACTGTGCCA GGCCTGAACT GTGT7TTTAA AAATGTCTGA 6172 

CCAGCTGTAC ATAGTCTCCT GCAGACTGGC CAAGTCTCAA AGTGGGAACA GGTGTATTAA 6232 

GGACTATCCT TTGGTTAAAT TTCCGCAAAT GTTCCTGTGC AAGAATTCTT CTAACTAGAG 6292 

TTCTCATTTA TTATATTTAT TTCAG AT AAT GCA CCC CGG ACC ATA TTT ATT 6343 

Asp Asn Ala Pro Arg Thr He Phe He 
40 45 

ATA AGT ATG TAT AAA GAT AGC CAG CCT AGA GGT ATG GCT GTA ACT ATC 6391 
He Ser Met Tyr Lys Asp Ser Gin Pro Arg Gly Mer Ala Val Thr lie 

50 55 60 

TCT GTG A AG TGT GAG AAA ATT TCA ACT CTC TCC TGT GAG AAC AAA ATT 6439 
Ser Val Lys Cys Glu Lys He Ser Thr Leu Ser Cys Glu Asn Lys He 

65 70 75 * 80 

ATT TCC TTT AAG GTAAG ACTGAGCCTT ACTTTGTTTT CAATCATGTT AATATAATCA 6496 
He Ser Phe Lys 

ATATAATTAG AAATATAACA TTATTTCTAA TGTTAATATA AGTAATGTAA TTAGAAAACT 6556 

CAAATATCCT CAGACCAACC TTTTGTCTAG AACAGAAATA ACAAGAACCA GAGAACCATT 6616 

AAAGTGAATA CTTACTAAAA ATT ATC A A AC TCTTTACCTA TTGTGATAAT GATGGTTTTT 6676 

CTGAGCCTGT CACAGGGGAA GAGGAGATAC AACACTTGTT TTATGACCTG CATCTCCTGA 6736 

ACAATCAGTC TTTATACAAA TAATAATGTA GAATACATAT GTGAGTTATA CATTTAAGAA 6796 

TAACATGTGA CTTTCCAGAA TGAGTTCTGC TATGAAGAAT GAAGCTAATT ATCCTTCTAT 6856 

ATTTCTACAC CTTTGTAAAT TATGATAATA TTTTAATCCC TAGTTGTTTT GTTGCTGATC 6916 

CTTAGCCTAA GTCTTAGACA CAAGCTTCAG CTTCCAGTTG ATGTATGTTA TTTTTAATGT 6976 

TAATCTAATT GAATAAAAGT TATGAGATCA GCTGTAAAAG TAATGCTATA ATTATCTTCA 7036 

AGCCAGGTAT AAAGTATTTC TGGCCTCTAC TTTTTCTCTA TTATTCTCCA TTATTATTCT 7096 

25 CTATTATTTT TCTCTATTTC CTCCATTATT GTTAGATAAA CCACAATTAA CTATAGCTAC 7156 

AGACTGAGCC AGTAAGAGTA GCCAGGGATG CTTACAAATT GGCAATGCTT CAGAGGAGAA 7216 

TTCCATGTCA TGAAGACTCT TTTTGAGTGG AGATTTGCCA ATAAATATCC GCTTTCATGC 7276 

CCACCCAGTC CCCACTGAAA GACAGTTAGG ATATGACCTT AGTGAAGGTA CCAAGGGGCA 7336 

ACTTGGTAGn GAGAAAAAAG CCACTCTAAA ATATAATCCA AGTAAGAACA GTGCATATGC 7396 

AACAGATAC:A GCCCCCAGAC AAATCCCTCA GCTATCTCCC TCCAACCAGA GTGCCACCCC 7456 

30 TTCAGGTGAC AATTTGGAGT CCCCATTCTA GACCTGACAG GCAGCTTAGT TATCAAAATA 7516 

GCATAAGAGG CCTGGGATGG AAGGGTAGGG TGGAAAGGGT TAAGCATGCT GTTACTGAAC 7576 

AACATAATTA CAAGGGAAGG AGATGGCCAA GCTCAAGCTA TGTGGGATAG AGGAAAACTC 7636 

AGCTGCAGAG GCAGATTCAG AAACTGCGAT AAGTCCGAAC CTACAGGTGG ATTCTTGTTG 7696 

AGGGASACT'J GTGAAAATGT TAAGAAGATG GAAATAATGC TTGGCACTTA GTAGGAACTG 7756 

GGCAAATCCA TATTTGGGGG AGCCTGAAGT TTATTCAATT TTGATGGCCC TTTTAAATAA 7816 

35 AAAGAATGTG GCTGGGCGTG GTGGCTCACA CCTGTAATCC CAGCACTTTG GGAGGCCGAG 7876 

GGGGGCGGAT CACCTGAAGT CAGGAGTTCA AGACCAGCCT GACCAACATG GAGAAACCCC 7936 

ATCTCTACTA AAAATACAAA ATTAGCTGGG CGTGGTGGCA TATGCCTGTA ATCCCAGCTA 7996 

CTCGGGAGGC TGAGGCAGGA GAATCTTTTG AACCCGGGAG GCAGAGCTTG CGATGAGCCT 8056 

AGATCGTGCC ATTGCACTCC AGCCTGGGCA ACAAGAGCAA AACTCGGTCT CAAAAAAAAA 8116 

AAAAAAAAAG TGAAATTAAC CAAAGGCATT AGCTTAATAA TTTAATACTG TTTTTAAGTA 8176 

40 GGGCGGCGGG TGGCTGGAAG AGATCTGTGT AAATGAGGGA ATCTGACATT TAAGCTTCAT 8236 

CAGCATCATA GCAAATCTGC TTCTGGAAGG AACTCAATAA ATATTAGTTG GAGGGGGGGA 8296 

GAGAGTGAGG GGTGG ACTAG GACCAGTTTT AGCCCTTGTC TTTAATCCCT TTTCCTGCCA 8355 

CTAATAAGGA TCTTAGCAGT GGTTATAAAA GTGGCCTAGG TTCTAGATAA TAAGATACAA 8416 

CAGGCChGGC ACAGTGGCTC ATGCCTATAA TCCCAGCACT TTGGGAGGGC AAGGCGAGTG 8476 

TCTCACTTGA GATCAGGAGT TCAAGACCAG CCTGGCCAGC ATGGCGATAC TCTCTCTCTA 8536 

CTAAAAAAAA TACAAAAATT AGCCAGGCAT GGTGGCATGC ACCTGTAATC CCAGCTACTC 8595 

GTGAGCCTGA GGCAGAAGAA TCGCTTGAAA CCAGGAGGTG TAGGCTGCAG TGAGCTGAGA 8656 

TCGCACCACT GCACTCCAGC CTGGGCGACA GAATGAGACT TTGTCTCAAA AAAAGAAAAA 8716 

GATACAACAG GCTACCCTTA TGTGCTCACC TTTCACTGTT GATTACTAGC TAT AAAGTCC 8776 

TATAAAGTTC TTTGGTCAAG AACCTTGACA ACACTAAGAG GGATTTGCTT TGAGAGGTTA 8836 

CTGTCAGAGT CTGTTTCATA TATATACATA TACATGTATA TATGTATCTA TATCCAGGCT 8896 

TGGCCAGGGT TCCCTCAGAC TTTCCAGTGC ACTTGGGAGA TGTTAGGTCA ATATCAACTT 8956 

TCCCTGGATT CAGATTCAAC CCCTTCTGAT GTAAAAAAAA AAAAAAAAAA GAAAGAAATC 9016 
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SO 



CCTTTCCCCT 

AAGGTCATTG 

GTGGTCAACA 

ATGTGCAATA 

ACTGTAACTT 

TCTGTCGCCC 

GGGTTCACGC 

CCATGCCCAG 

ATGGTCTCGA 

ACAGGCGTGA 

TGTAATGTTA 

ATTTCAGATT 

GTAGACAGCT 

GACCCACACT 

ATTCAGAAGT 

AAGGATGAAG 

GAATCTGTGC 

CATGGAAGAA 

GCACTAACAG 

CAAGTAATCT 

CATTTGGCCT 

ACTACTATGG 

AATACAGCAG 

CACTGTAAGT 

TGTCTCTCTC 

CCTGGAATCC 

GGACCAGCTT 

TAAAAATTAG 

CAGGGGGATT 

ACTTCTGGCT 

ACTAGCCTAA 

AGGACAGAAA 

TTGAGCAAAT 

GGGAAAACTA 

GAGTTAAGAA 

AGAAGCAAAA 

AAATTGTTCA 



CCT CCT 
Pro Pro 

AGA AGT 
Arg Ser 
105 
TAC GAA 
Tyr Glu 
120 

CTC ATT 
Leu lie 



GAT 
Asp 
90 
GTC 
Val 

GGA 
Gly 

TTG 
Leu 



ACT GTT CAA 
Thr Val Gin 



TGGAGCACTC AAGTTTCACC AGGTGGGGCT TTCCAAGTTG GGGGTTCTCC 
GGATTGCTTT CACATCCATT TGCTATGTAC CTTCCCTATG ATGGCTGGGA 
TCAAAACTAG GAAAGCTACT GCCCAAGGAT GTCCTTACCT CTATTCTGAA 
AGTGTGATTA AAGAGATTGC CTGTTCTACC TATCCACACT CTCGCTTTCA 
TCTTTTTTTC TTTTTTTCTT TTTTTCTTTT TTTTTGAAAC GGAGTCTCCC 
AGGCTAGAGT GCAGTGGCAC GATCTCAGCT CACTGCAAQC TCTGCCTCCC 
CATTCTCCTG CCTCACCCTC CCAAGCAGCT GGGACTACAG GCGCCTGCCA 
CTAATTTTTT GTATTTTTAG TAGAGACGGG GTTTCACCGT GTTAGCCAGG 
TCTCCTGAAC TTGTGATCCG CCCGCCTCAG CCTCCCAAAG TGCTGGGATT 
GCCATCGCAC CCGGCTCAAC TGTAACTTTC TATACTGGTT CATCTTCCCC 
CTAGAGCTTT TGAAGTTTTG GCTATGGATT ATTTCTCATT TATACATTAG 
AGTTCCAAAT TGATGCCCAC AGCTTAGGGT CTCTTCCTAA ATTGTATATT 
GCAGAAGTGG GTGCCAATAG GGGAACTAGT TTATACTTTC ATCAACTTAG 
TGTTGATAAA GAACAAAGGT CAAGAGTTAT GACTACTGAT TCCACAACTG 
TGGAGATAAC CCCGTGACCT CTGCCATCCA GAGTCTTTCA GGCATCTTTG 
AAATGCTATT TTAATTTTGG AGGTTTCTCT ATCAGTGCTT AGGATCATGG 
TGCCATGAGG CCAAAATTAA GTCCAAAACA TCTACTGGTT CCAGGATTAA 
CCTTAGGTGG TGCCCACATG TTCTGATCCA TCCTGCAAAA TAGACATGCT 
GAAAAGTGCA GGCAGCACTA CCAGTTGGAT AACCTGCAAG ATTATAGTTT 
AACCATTTCT CACAAGGCCC TATTCTGTGA CTGAAACATA CAAGAATCTG 
TCTAAGGCAG GGCCCAGCCA AGGAGACCAT ATTCAGGACA GAAATTCAAG 
AACTGGAGTG CTTGGCAGGG AAGACAGAGT CAAGGACTGC CAACTGAGCC 
GCTTACACAG GAACCCAGGG CCTAGCCCTA CAACAATTAT TGGGTCTATT 
TTTAATTTCA GGCTCCACTG AAAGAGTAAG CTAAGATTCC TGGCACTTTC 
ACAGTTGGCT CAGAAATGAG AACTGGTCAG GCCAGGCATG GTGGCTTACA 
CAGCACTTTG GGAGGCCGAA GTGGGAGGGT CACTTGAGGC CAGGAGTTCA 
AGGCAACAAA GTGAGATACC CCCTGACCCC TTCTCTACAA AAATAAATTT 
CCAAATGTGG i'GG'i'GTATAC TTACAGTCCC AGCTACTCAG GAGGCTGAGG 
GCTTGAGCCC AGGAATTCAA GGCTGCAGTG AGCTATGATT TCACCACTGC 
GGGCAACAGA GCGAGACCCT GTCTCAAAGC AAAAAGAAAA AGAAACTAGA 
GTTTGTGGGA GGAGGTCATC ATCGTCTTTA GCCGTGAATG GTT ATT AT AG 
TTGACATTAG CCCAAAAAGC TTGTGGTCTT TGCTGGAACT CTACTTAATC 
GTGGACACCA CTCAATGGGA GAGGAGAGAA GTAAGCTGTT TGATGTATAC 
GAGGCCTGGA ACTGAATATG CATCCCATGA CAGGGAGAAT AGGAGATTCG 
GGAGAGGAGG TCAGTACTGC TGTTCAGAGA TTTTTTTTAT GTAACTCTTG 
CTACTTTTGT TCTGTTTGGT AATATACTTC AAAACAAACT TCATATATTC 
TGTCCTGAAA TAATTAGGTA ATGTTTTTTT CTCTATAG GAA ATG AAT 

Glu Met Asn 
85 

AAC ATC AAG GAT ACA AAA AGT GAC ATC ATA TTC TTT CAG 
Asn He Lys Asp Thr Lys Ser Asp He He Phe Phe Glu 

95 100 
CCA GGA CAT GAT AAT AAG ATG CAA TTT GAA TCT TCA TCA 
Pro Gly His Asp Asn Lys Met Gin Phe Glu Ser Ser Ser 

110 H5 
TAC TTT CTA GCT TGT GAA AAA GAG AGA GAC CTT TTT AAA 
Tyr Phe Leu Ala Cys Glu Lys Glu Arg Asp Leu Phe Lys 
125 130 135 

AAA AAA GAG GAT GAA TTG GGG GAT AGA TCT ATA ATG TTC 
Lys Lys Glu Asp Glu Leu Gly Asp Arg Ser He Met Phe 

140 145 150 

AAC GAA GAC TAGCTATTAA AATTTCATGC C 
Asn Glu Asp 
155. 



9076 
9136 
9196 
9256 
9316 
9376 
9436 
9496 
9556 
9616 
9676 
9736 
9796 
9856 
9916 
9976 
10036 
10096 
10156 
10216 
10276 
10336 
10396 
10456 
10516 
10576 
10636 
10696 
10756 
10816 
10876 
10936 
10996 
11056 
11116 
11176 
11233 



11281 
11329 
11377 
11425 
11464 



1 3. The genomic DN A of claim 1 , which has a nucleotide sequence selected from the group consisting of SEQ ID NO: 
14 and its complementary sequence; 
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SEQ ID NO: 14: 
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ACTTGCCTTA 
GTTCAAGAAA 
GTTTAGAAAT 
TATAAAATAG 
GTGGACTATC 
GGACATATAC 
ACATGCTAGA 
CCCAAGAAAC 
TTTCATTTGT 
CACAATCTTA 
AGTAGCTAGG 
GTTTTCAGAG 
CGATACTCCT 
GCCTTAAATT 
GAACCAGTAG 
CTTGTGAAAT 
AAGATCAACT 
AACTATAGTA 
TCATAAAGGC 
AATAGTTTCT 
TATGTTGTAG 
GAGCTTTCTT 
CCATAGGCAT 
TTACAAGCTG 
AGATTTAGGT 
TACTGCTGAT 
TCCTCTAAAT 
TTCCTTTAAC 
TCAAGGTCCT 
CTATGTACTT 
TTAAGATTCT 
GGAAGCTGAG 
GTGAAACCCT 
AATCCCAGCT 
GCAGTGAGCC 
CAAAAAAAAA 
TTCATCTAAT 
AGTGACACCA 
TCCCACTGTC 
TCGGAAGGGA 
TCTTTGAGAT 
AGAAACGATG 
CATATATAGA 
GTATCGTACG 
AGTATTAATA 
GTTGGAAAAT 
ATTTTCCCTT 



AAAGCTTTGC 
AATCATTTAA 
ATAAACATTT 
TCCGGAAATT 
TGGCACTGGA 
ATTTTGTTTA 
AAGCATATGA 
ACCTTGCTCA 
TTGTTTTTGT 
GCTCACTGTA 
ACTACAGGAA 
ACAATGTATT 
GCCTCAGCCT 
AGACTTTAAA 
ATGTTTTCAT 
TTTCCTAAAT 
GG1GGGGGCA 
CCTAGTTATC 
ACAGACTCAC 
TGGAGGCCTA 
GACAATCCTA 
AAAGGGAAGA 
ACAGCTTACC 
ATTTGCCATC 
ACAAACTCCA 
GTTAATTTAG 
CATTCTATCA 
TCAAAAACTT 
GTATTATCTG 
AGCCAAATGA 
TTTATCTTCG 
GCAGGAAGAT 
GTGTCTACTA 
ACTTGGGAAG 
GAGATTGTGC 
AAGGATTCTT 
TCAATCTCAT 
GTAAGACTGA 
CAACCTTACT 
TTTTCCCTAG 
GAGGGGAGCC 
CTGATTTGGG 
AGACAGAAAG 
TGTCCATTCC 
GTATCTAGAA 
ACTTCAGAGA 
TCCCCTTCAC 



ATAGGTAGAC 
GTTATAAAAT 
TATACATCAC 
TCAGAGAAAG 
GACTAAATAA 
TTAAGAAAAA 
CTTAGTCATT 
ATATATTAAA 
GACAAGTTCT 
GCCTCCTAGA 
CATTCCACCA 
GCAGCGTTGC 
CCCAAAGCAC 
TGTGGTTTTA 
AGCAATGAAG 
AATATAATCT 
GTAGTAAAAG 
TTACTTATCA 
TTCTGTCTCT 
TACTTAGTGA 
GCTCTGGGCA 
AATTTGAGTA 
TCCAATTCTC 
ATATTCCGAA 
TGCTACAAGC 
ACTGTCATTA 
ACTGCTATTT 
TCATTGTTCC 
GTGCAAGCCT 
GTCTCTCTGG 
GCCGGGCGCG 
CACCTGAGGT 
AAAATCCAAA 
CTGAGGTGAG 
CATTGCACTC 
CTATCTTCAC 
TTTTTACAAG 
GCTAAATTAG 
TTAAAGTAGC 
GAGTCCAAAT 
CTGTCCATAT 
TAACTTTAAC 
AGCAACAACA 
TGCCAGTACC 
AATACTACAC 
AGCCAACAGG 
CCCCTTCTCT 



AACATTAGAT 
ATAACAAACC 
CATTTAAATC 
ATGAATCTGA 
AGAAAGCAGG 
GCAAATAAAA 
TGAGTTTTTA 
TTTTATTTTG 
CGCTCTGTCA 
TTCAAGTGAT 
TGCCCAGCTA 
CCAGGCTGAT 
TAGGATTACA 
AACTCCTGTT 
CTAAACTGTA 
TCAAGGGAGC 
ACAGGATACT 
CAGCAAAATA 
AGATCTCAAG 
AAAAGCAGCT 
TACGAATACA 
GTATGTAAAA 
TTGGCCTCTT 
GGCACCAGCT 
TCTCTGGAAT 
TCTGTCACTT 
GGGTAATCTT 
AGAATAAGTT 
ACTAGTCCCA 
CAATTCTGCC 
CTGGCTCACG 
CGGGAGTTCG 
CATTAGCCAG 
AGAATCGCTT 
CAGCCTGGGC 
AAAATCTTAA 
TGAGAAAACA 
AACCGAGATC 
TTCAAATTTT 
GTTGAAACCT 
TCAAGTTATC 
ACATCTGTTT 
AATTTGAAAG 
TTTATAGTAT 
ATGCACAGCA 
CAGATTTTTC 
TCTCTCCCCA 



TAATTTCCTT 
TTCTGCATTA 
TTTCTCCAAG 
TTTTCCAAGA 
TACAGTCAAT 
CATTTTTCAG 
TTATTAAGGA 
GTTTTCAACT 
CCTAGGCCAA 
CCTCCTGTCT 
ATTTTGTTTT 
CTGAAACTCT 
GACATGAGCC 
GAAAAAGCGT 
ATTTAGACAG 
AAATCATGTC 
GTGCTCTTTA 
ATTACATAAA 
CTACCAAAAA 
GGAATCAACA 
TTAAATCCCA 
CAGAATAAAA 
GCAATTTCTA 
ACAAAGCTTA 
CCTTCCCTGT 
TCCTAAACTC 
TCAAAACTTT 
GAAATTCCAT 
TCATTTTCAA 
TTGTTTCAGG 
GCTGTAATCC 
AGACCAGCCT 
GCGTGGTGGC 
GAACCCAGGA 
AACAGAGCGA 
TGTTTAAACA 
GGGACAGTGA 
TCACTCGAGT 
ACTTTTACTT 
GGAAGGGTAT 
AATTGACTTT 
GATTAGTCCT 
ATGCTTGTTA 
GTAAGTTTAC 
GTGCTAACTT 
TCTCTTCCCT 
AGTAACACTG 



GCTCACATCT 
TAAGACTGAT 
GCTTCATCTT 
GAGGACAGCT 
AAGATCTTCA 
AAAAAGGCAA 
AATTTACAGG 
AGACTTTGCT 
AGTGTAGTGA 
CAGACTCCTG 
GTTTTGTTTT 
TAGCCTCAAA 
AATGCGCCCA 
CTGGTATCTT 
TAGCCAAATG 
CCAAATGCAA 
AAAGCTCAGT 
ATCCTATGGA 
GAAATCTCCC 
TAGTTCCTCC 
CTTATCTATA 
GATTAAGGCT 
TTATCAGGCT 
GAACAATGCC 
TTCCCACTCC 
AATTTCTCCC 
GATTACTGCA 
GATATGGCCT 
CTACTCCTCT 
ACTGGCTCAG 
CAGCACTTTG 
GGCCAGCATG 
AGGCGCCTGT 
GAGGGAGGTT 
GACTCCACCT 
GGTCTTACAG 
CGGTGGATCA 
CTGAGGTTAT 
TTCCATAAAT 
AGTCTCTGTG 
GTTGTTTTTG 
ATAAAATATG 
AGTAAATTCT 
GTGCTGTAAT 
TGCCTTGGGA 
TCCCCTTCTA 
TGCACCTATG 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
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TCAAACGAAA ACTTATAATC AAGTAACTGT TTCTGCAAAA ATAAGTTCGT TTTCCTGTCA 2880 

5 TGGCTCAAGG CCTCAGCAGA TCCAGGCCTG GTGGACGGGC TGGTCTTCGT CGTGTGCCAA 2940 

ACACTGACCA CTGCCCTGGC TCTGCCATCT TAGGCTTAGT GACCTGGCTG TTACTAAGCA 3000 

CTGTCCCCTC TGCCCCATGC AGCTGTCTCC TTCTAGTCTT CTCCCTCTTC TCAACGCGAT 3060 

CCTAGCCCCT CAGGCCATTT CACCTCCATT TTCCCTCACT TCCCGCCGCC CCTCCCCACT 3120 

TCCTCCCTAC TGTTGTTTCC GCCCCACTAG AGCCCCTCAG AGAAAGTTTC CATCCTCGCA 3180 

CCCTTCCTTG TGTCACAGCC CGTCACATTC TCACAGGCGC CCATCCCTCC AGCCCCACCC 3240 

10 CAAGGCCAAT GTACTTCGCG GTATGGGGAC CTTCCTCGTC AGCGAACGCG AGGGAGTGAA 3300 

GACCCTGGGC GCGGGGTGCT CGGACTTCGG GGGTGGAGGT GGGAAGCGCG CCGCACTCCC 3360 

AGCAGCCCCT GCACGAGTCA CGTGACAGCT CTCCCACCAC CACCCCCCCC AACTTCCCCA 3420 

CCGTAGCCTC CCAGAGCCAG GCCCCACGGA AAGGCAGCTT TTTCCCGGTT TTCTCCCGCT 3480 

CTTTCCCCTC CACTTGGAAT ACTCGTGAAA CAAAAATCTC TCCCTGCCAC CCTGTGTGTG 3540 

TTTGAACCAG GAAAAAATCT GAAACTGGTC AAGAAAGAAC AAGGAAGACT TGCCAAAGCA 3600 

AGGCCGGTGT GTGTCCCAGC AGCTTAGAAT CTCAGCAAAG CAACACAAAA TAGCACATCC 3660 

ACGGCCTCTT TTCGAGTAAA ATTTACTTGG TTTGTTTGCA GGAAGGGTTT AAAACTGCGT 3720 

TTGCAGATGC TCTGTTTGCA GGAAGGCTTT AATCACGTGT TCCCCTGGCC CACAAGCAAG 3780 

GCTTTTAGAT CCAGAGCCTC AGTTACTGCC CCCTCTTCCT CTTTGGTGCA ACCAAACGTT 3840 

CAGAATCACG CCTTCTTAGA AAATTCTTAC CCCGGGTGTG TCAATAAGTT AAGTCTAATT 3900 

20 GGCAACACCT A TC A AAA ACT GTTGCATAAC ACACATGCCT CACATAATTG TAGCTTTGCC 3960 

TCATCGGGTG TTTTAATGCG GAGGCTTTGA CCTGCAATTT CAAAGATATA CATTCCAAGC 4020 

TTACGCCCAG TTAGTGGATG TGGAAGAAAA AAAAAAGCAA ATTACCTCAT AACACAAAGG 4030 

TCAATAACAC ACATCCATAA GCTCCAGGTA CAAAATCTTA CATCTTAGAG AACTATATTT 4140 

AACATTTACA TACATTACTA AGGTTTTTTT TTTCCTTTTG CTTGATTAAA TGTTAGTTAT 4 200 

CATTAAGTCT TGGAATTATT CTGTGTGTCT ATATTTATTT GCTGTTTGTG AAGAAGCCGG 4260 

25 TTGTTTTAAA TAAGTTCCTA GAAAATAAGC GCTCAATGTG TTTAATCTGA GTTGCTAATA 4320 

TTGTGAAATA TAGGCCACAT AATACTAGCC TAGATAACTA TGGCGAAGTA AGGAGTCTCA 4380 

AACACTGTCC CAGAACAATA GCAATCTGTG TTGAATTTTT ACCCTCTGTG GTAAAATGAA 4440 

GGGAAAAGGA ATGAAGTTTT AGTTTGCCTT AATTTTTATC TTTATTGTTT CAGACTCTTC 4 500 

AGCAGTATAA AGTTTTCATC AAGTCAAATA TATTCACTTT AAAGTGA.CTG TGCTTTATTC 4 560 

TGATACCATC TCCTTCCTAA TTTGGGGGGC CAGCTCAGAT AAGTTTTATG AAATAAAAAG 4 620 

30 ATT A AAA ATT CTTACATTTT TAGTGTCCTT CCTTGGTAAA ATGTAGAGTT GTCCACTGTG 4 680 

TTTATCTCCT CCTCCTTATT ATCATGGTTG CTGTTATTAT TTTTAATGGT TCATTAAACC 4740 

CAAGGGTCTG GGAAATACTC ATGGAATTCA TCTCACAGCC TTCACACTGT ATGATATTTA 4 800 

AACAGGTGGT TGTCCATCTG ATTCTTAAAA TATTTCCAAG AAAAATGATT CCACCTAATG 4860 

CATAAATGCT TTCATCAGAT TAACAGAACA CCATGGACAT TTTATTTTAT TTTATTTTTT 4920 

35 AAATATTAAC TTCCATTGCA TAAGCTAAAT GGGTAGGAAT AAGTGAGATG ATATTGTTAT 4980 

CTAGAGCTTT AAAATATTCA AAGGGCTGTC ATCATTATCT CATTTAATCT TTGAAAACAA 5040 * 

CTCTATGAAG TACAAAGGAC ACTGAGACAT TTGTTGCTCT ATATCAAAGA AAAAAGTGTT 5100 

TGTCCCAAAA CTTCAAAATG TGTAAATTAC ACATTCTGCA TCTTTACAGC TGGAGAAAAT 5160 

TCACTGGCAA TGGAATATTT AAAATTAGAG CTTGCTTAGT GTGCTGCTTC TGATCACTAC 5220 

TTGATCCCAC TTCGTGCTTT CATGTTAATT GGCCCAATTG GACTCTACAG TTGGAAGGTG 5280 

40 AAAACTTACT ATTTCAACTT GAGTCACGTA TGTATTCTTA TCATATACTT CTTAAAGGTA 5340 

CTATTTTTTT TCTTCTGATA GTCACCACAC CAAGCACTTC CAGCCACCCT GCCACAGACT 5400 

TCCTTTGTAA TCACTGTTGA AGGACATGAT GTTTTTATGA CTTCCCGAAA TGAAAACCCT 5460 

ATCTTGTTTT TAAAACAAAC AAACCAACAA AAAGTAGTGT TTATGTAAGC ATTTTGTTCC 5520 

CTGACTCTAG GAACCCCTCT GTTTTTATAT CAACTCTGTA CTGGCAAAAC ACAAAAACAA 5580 

AATGCCACCT TGCTAATTCC CTTCCTAGCA AAGTAATACA GTTTAGCACA TGTTCAAGAA 5640 

45 AAAAATGGCT AAGAAATTTT GTTTCCACTA ATTATTTTCA AGACTGTGAT ATTTACACTC 5700 

TGCTCTTCAA ACGTTACATT TTATAAGACT ATTTTTTAAC ATGTTGAACA TAAGCCCTAA 5760 

ATATATGTAT CCTTAAATTG TATTTCAAAT ATTTTAGGTC AGTCTTTGCT ATCATTCCAG 5820 

GAATAGAAAG TTTTAACACT GGAAACTGCA AGTAAATATT TGCCCTCTTA CCTGAATTTT 5880 

GGTAGCCCTC TCCCCAAGCT TACTTTCTGT TGCAGAAAGT GTAAAAATTA TTACATAAAA 5940 

TTCTAATGAT GGTATCCGTG TGGCTTGCAT CTGATACAGC AGATAAAGAA GTTTTATGAA 6000 

AATGGACTCC TGTTCCACTG AAAAGTAAAT CTTAATGGCC TGTATCAACT ATCCTTTGAC 6060 

ACCATATTGA GCTTGGGAGG AAGGGGAAGT CCTGAATGAG GTTATAAAGT AAAAGAAAAT 6120 

ATTTGCAAAA TGTTCCTTTT TTTAAAATGT TACATTTTAG AAATATTTTA AGTGTTGTAA 6180 

CATTGTAGGA ATTACCCCAA TAGGACTGAT TATTCCGCAT TGTAAAATAA GAAAAAGTTT 6240 
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TGTGCTGAAG 
CTTCTAATGG 
CAGGTTTTGG 
GATATTACAT 
TAGGAAAGAG 
CCTTCTTCCT 
ACAGTCAGCA 
CTAAAGCGGC 
ACCTTCCAGA 
TCTTTATTTG 
, GGAATATAAA 
GCAAAGTCAG 
GAGGTTCTCA 
TAAGTAACAT 
AGAGTGACAC 
CTGCTGGATT 
CTGGTAAACT 
AATTATTAGA 
AGATAAATAA 
ACAAAAAATT 
ACAATTGGAT 
AATAAATATC 
CCTGAAATTC 
GGCCAGGTGC 
CCAGCCAGGG 
TGCCTGTAGA 
GTGAGTTCAG 
AAC'iTCCTTC 
TACTCTTTTA 
CTGCAAAATA 
CACTCCTGTA 
TTCAAGACCA 
GCCAGATGTG 
CACTTGAGCC 
GCCTGGGTCA 
ACATCGCTCT 
GTCCATTGCC 
TCCTTTTCAA 
TCTACAGACT 
TTCTTCAACA 
GATTTATAGA 
GATCTCAGGT 
GATTGGAATT 
TTAACTTTTA 
TGTGTCACCC 
GGGTTCAATC 
ACCATGCCTG 
CTGGTCTCGA 
TTACAGGCAT 
ACAATGCTGG 
GTGGTTACTA 
CTAGAATTGC 
GTCCAAATGA 
CTCATGTGTC 
AAAGTATTGC 
CCTGGGAGGG 
AGAATTTTAG 



TGTGACCAGG 
ACTAAGGAGG 
AAGGCACAGA 
TAAAAGAAGT 
CCTGTTTGAA 
CATTCTCTCC 
AGGAATTGTC 
TGCCACCTGC 
TCGCTTCCTC 
GAAAGTCAGC 
CCATCTAGCA 
AGTAGAATTT 
AACTTTTTGC 
TTTTAATTAA 
TTTTTTATTT 
CTCATCTGCT 
CTGTTGTACA 
AATAGCTTTC 
GCTTCTCTTT 
TGCCCATGGT 
AATATAGCAT 
TCTGTTTGAA 
GTCTTTTTGC 
ACTAGCTCAT 
CAACACAGTC 
ACTACTCAGG 
CCACTAAACA 
AAAATAACTT 
CCTGATTTCC 
GCAGGACTGT 
ATCCCAACAC 
GCCTGGGCAA 
GTAGTATGTG 
CAGGAGGTCA 
TAGAGCAAGA 
ATTCAGTTCA 
TTCTCTATCT 
ACTAGGCCAT 
CAAAAACAAA 
CTCACATACA 
ACTGAAAAGT 
AATTAATTAT 
CCACAGCAAG 
ATCTAGTAGT 
AGGCTGGAGT 
AGTTATTCTG 
GCTAATTTTT 
ACTCCTGACC 
AAGCCACCGT 
TTGTATAATA 
GGCCGTTTGC 
AGGTTGGTAG 
GGGAACTGAG 
CTAATTTCCA 
ATAACTAAAT 
TCTTTCTGAG 
ACGAAATAGT 



AAGTCTGAAA 
TGCTTTCTTA 
GCCCCAACTT 
ACTCGTATCC 
GGCGGGCCCA 
CCAGCTTGCT 
TCCCAGTGCA 
TGCAGTCTAC 
TCGCAACAAA 
CATGGCAATT 
TCACTACGAT 
TTTTCTTTTA 
TCTCATGTTC 
AAATAACTAT 
TTACAAGTGT 
TTGCATTCAG 
CTCATGAGAG 
ACTTTAGGAA 
AAACGGAATC 
TAGTCATCTT 
TCCCCGAGAT 
GTTGAATAAC 
CTATATTCAG 
GCCTAGAATC 
TCTACAAAAA 
ATGCTGAGGA 
GAGCGAGACT 
TTTATCTGCA 
AAAGCCCTCC 
TCCACTACAA 
TTTGGAAGGC 
CATGGCAAAA 
CCTGTAGTCC 
AGGCTACAGT 
CCATGTCTCA 
CCCCCACCAC 
ATTCAAATCT 
TTAAACTACA 
AACTTAAAAA 
CGCATTCATA 
TAGGTTTTGA 
GTAGCATGCT 
GATAAACATA 
ATGTTTGTTG 
GCAGTGGCAC 
CCTCAGTGTC 
GTATTTTTAG 
TCAAGTGATC 
GCCCAGCCTA 
AATATGCCAT 
CACATAtCAA 
AGCTGGAACA 
ACCCTTAAAA 
TCATGAAATT 
TTTTATGTCT 
GTGGTTTATA 
CAAAGCATTT 



ATGAAGAGAG 
AAGTCAGAAA 
TTACGGAAGA 
TCTGCCACTT 
AGGAGTGCCG 
GAGCCCTTTG 
TTTTGCCCTC 
ACAGCTTCGG 
CTATTTGTCG 
AGAGGTAAAT 
GAGCAGTCAG 
TCAGATATGG 
CCTTTACACT 
GTACTTTTTT 
TTTAACTGGT 
ACTACTGCAA 
AATGGGTGAA 
CTCCCTGAGA 
TCAAGACAGA 
GTGAAATCTG 
AATTTTCTCT 
AAAAATTAGG 
CTACTTTACG 
TCAGGCAGGC 
AATAAAAAAT 
CTGCTTGAGC 
TTCTCAAAAA 
ATGTTTTCCT 
ATAATCTAAT 
TCCAAAAATC 
CAAGGCAGGT 
ACCCTGTCTC 
CAACTACTCA 
GAGCCATGTT 
AAAAAAAAAA 
AACATTGTTT 
TTAAGCATTC 
TCAGTTCCAT 
CTTATTTTTT 
ATAAGATGGC 
TCTTGTTGCT 
CCCTCATTTC 
ATCATAGTTG 
TTGTTGTTGT 
GAACTCGGCT 
CCAAGTAGCT 
TAGAAACAGG 
CAGCCGCCTC 
ATAGTATGTT 
AAATATTTAC 
TGGTTCTCTC 
GACCTTAAAG 
TTAAGTGACT 
CTACCATTCA 
GTTTTAAAGA 
ACTCTTAAAA 
TTATCCAATG 



ACAGATGACA 
GAGATA'CTCA 
AAAGATTTCA 
TATTTCGACT 
ACAGCAGTCT 
CTCCCCTGGC 
CTGGCTGCCA 
GAAGAGGAAA 
CAGGTAAGAA 
AAGCTAGAAA 
TATCAACATA 
GAGAGTATCA 
AAGCACATCA 
AACAACAAAA 
TTAATAGAAG 
TATTGCACAG 
AAAGACAAAT 
ATTGCTGCTT 
ATCAGTTACA 
CCACACCTTT 
CACAATTAAG 
ACCCCCTAAA 
TTCTATTAAA 
CTGAGCCCAG 
TACCTGGGTG 
CCAGGATAGC 
AACAAACAAA 
ATTGCCTGTG 
CCGACTTTAC 
ACAGGTTGGG 
GGATTGCTTC 
TCCAAAACAT 
AAAGGCTAAG 
TACTGTGTCA 
AAAGAAAAGA 
TGATTATCAC 
TTTGAGATTC 
TTTGATTTTC 
AAGTTTTCTG 
AGAATGTTCA 
GTCAAGATGA 
ATCCCATACC 
CTTTTCAAGT 
TGTTTGAGAT 
CACTGCAACC 
GGGACTACAA 
GCTTCACCAT 
GGCCTCCCAA 
TTTAAACTCT 
TGTCTTAGAA 
CTTACAGCTT 
ATTGACTAGC 
TGCCCCAGAC 
CTAGCCTCTG 
ACAAATTGTC 
AAAAAAAAGT 
GATCTATAAT 



AAAGAAGATG 6300 
GAAAGAGGTA 6360 
TGAAAATAGT 6420 
TCCATTGCCC 6480 
CCTCCCTCCA 6540 
GACTGCCTGG 6600 
ACTCTGGCTG 6660 
GGAACCTCAG 6720 
ATATCATTCC 6780 
GCAATTGAGA 6840 
AGAAATATAA 6900 
CTTTAGAGGA 6960 

CATGTTAGCA 7020 

AAAAGCATAA 7080 

CCATATAGAT 7140 

AATGCAGCCT 7200 

TACGTCTTAG 7260 

TAGAGTGGTA 7320 

TTAAAAGCAA 7380 

GGACTGGGCT 7440 

GAAAGGGCTG 7500 

TTTTAGGGCT 7560 

TCTTCTTTCA 7620 

GAATTTGAGA 7680 

TGTTGGTGCA 7740 

CAAATCTGTG 7800 

AAAACAAACA 7860 

AGATTAAATT 7920 

CTTGTGTTCA 7980 

TGCAGTGGCT 8040 

AGCTCAGGAG 8100 

ACAAAAATTA 8160 

GCAAGAGGAT 8220 

CTGCACTCCA 8280 

A A AG AAA AAA 8340 

ATAAATGCTG 8400 

AACTCAATTC 8460 

TTGCTTTGAG 8520 

CTACTCTCAC 8580 

AGGATAAAAT 8640 

CTACCTACCT 8700 

TATTCAACAG 8760 

TCAAGGCATT 8820 

GGAGCCCTGC 8880 

TCTGCCTCAT 8940 

GGCACATGCC 9000 

GTTGGCCAGG 9060 

AGTGCTGGGA 9120 

TAGTGGCTTA 9180 

TTATGAAGAA 9240 

TAATTAGAGT 9300 

CAACTTCCTT 9360 

AAAACTGGAA 9420 

GCTAGTTGTC 9480 

ACTGCTTACT 9540 

CAGTAGTCTG 9600 

TTTCATAGAT 9660 
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TAGAGTTAAA 
TGGGGATGAG 
ATTGTAAGAA 
AGGGAAAAAT 
TTCCCTCCTG 
AAAAATATAT 
ATASCAAACT 
AGGTGTTGAA 
GAACAGACCT 
CCATGAACAG 
GAGCAGAGAA 
GCAGGGTTTG 
AGAGAATAAC 
TATTAAAATA 
TCGAGAGGAG 
ACAAGGGAAA 
TAACTAGGAC 
TGGAGCCAAA 
CCCTAACTCT 
ACTTTTCCCC 
TACCCAAACT 
TAAGCTTATA 
CTTTCTCTTT 
ACTTGAGCTG 
CCAAGCCAAA 
ATCAGCTGAG 
CATCAACTCT 
GTTCAATCAG 
TTTGCTGCCC 
AAGCAGTTCA 
TTTGTGTTTT 
ACCTCAAGCA 
TGCACACGGA 
TTGACCATAT 
ACTCTTTAGT 
AAAGTATACT 
GTTTATATGG 
TAATTTCAAA 
GTGTCTCTTA 
TAAATAACTC 
TAAATAATTT 
GCAAAAAAAA 
AAAACAATAT 
CAAAACACTA 
GGTTGTTGTA 
AAGAAAAGTT 
GCCAAGGCGG 
AAACCCGCCC 
CTGTGATCCC 
GACCAGCCAT 
CATGACTGTA 
TGGAGGTTGC 
ACTCTGTATC 
GAAATGCATT 
GGCCTCAAAC 
TCTTCTTGGT 
ACAGTCATCA 



TCAAAGAAAC 
AACACACTAC 
AATAAGAATG 
GGAGAAGACA 
TTCTTCCTCC 
ATATTCTACA 
TTTATTCATA 
AATTCAGGGG 
ATTTTTAATC 
GTGTGAAGGA 
GTTTTCCTAG 
CAGGCAAAAA 
TGTGGCAACA 
GAAGATGCCA 
ATACCAAATT 
GATTAGACAA 
ATGTTTTGAA 
TTAAAATTTG 
CTACTTTGTA 
TCCCATTCTG 
GTTCTACGGT 
TGTAAATTAG 
CGACTTCTGT 
TATTTTCGTT 
AACTACGAAG 
AAATCCCGCT 
TAATTGTTAA 
GTCCATTCTT 
AGGCTGAAGT 
CCCTCCCGAG 
CAGTAGAGAC 
ATCCACCCAC 
CCAGATCCAT 
CTTTCTCCAA 
CTGCTTACAC 
GCAAGTTACA 
GCAAAATGGA 
GCTGTCAGGT 
TACTTGGTAG 
AGAAGTTCTT 
TATATATGAT 
AAAAAAAAAA 
TAAAAAGGCC 
GTAATAATGC 
GGTTTATGTA 
AACTTTGGCT 
GCAGATCACC 
CCCTCACTAC 
AGCACTTTGG 
GGAGAAACCC 
ATCCCAGCTA 
AGTGAGCCGA 
CAAAAAACAA 
CTTTCTGTAA 
AACCCATATT 
GGAAGCTCTT 
CTTTCTGTGG 



ACGGATGAGA 
TTGTAATCAG 
AAGAATTCAA 
TTAGAAAAAT 
TTCTCATTGG 
CATCCCTTTC 
CAACATTTAT 
AAAAAAGACA 
AAAGTAATCT 
GGTAGGACTC 
AGGAACTATT 
AAAAAAAAAG 
AT GG AG GAGA 
GGGGTAATGA 
CTGGAGACAT 
AGGAGTTAAG 
AAGTAATGTA 
TACATGTATA 
GCCAGACTTC 
TCCTAGATAT 
TGCCCAAAAC 
GAGCTCTACA 
GACACATCTC 
CTTCTTTCTT 
TCATCCTCAG 
GTTTAGTATC 
AATTACTTCA 
TTGTTCTTGG 
GCAGTGGAGC 
TAGCTGGGAC 
AGGGTTTCAC 
CTCAGCCTCC 
TGTTTATGTT 
TTTAAGTCAG 
AAGGCCTTTG 
TTTTATGTGA 
AATAATGTTA 
CAAATGAGTT 
AATTATCTGC 
CAGACATACA 
AATGTTATCC 
AACTCCAAAT 
AAAGTACCAC 
TGAGAAAGTT 
CTCCAAGAAT 
GGGCGCAGTG 
TGAGGTCAGG 
TAAAAGAATA 
GAGGCCGAAG 
GTCTCTACTA 
CTCAGGAGGC 
GATCGTGCCA 
AAGAAAAGAA 
AATGTGACTA 
TTCTCGGTCT 
CCTCTGGCCT 
TAACCTTCTC 



AAGGAAGAGG 
TCATAGATGT 
A A TC A AC AC A 
TATTCTATTT 
TTTTCAGGTG 
TACGCTGTTG 
TGAGTTCTTA 
ACTCATTGTC 
CAATTTAGGG 
TGAGGAGAGA 
AAAGCTGGGA 
GCAGGGGAAG 
GTCTGGAAGC 
GGGCTTGATT 
TTCTGAGTTA 
AATGACTCCC 
TTGGATCTCT 
TAACTCTCCC 
CTAAAAGAAT 
TTGTCCACCT 
TTCCTAATTG 
GTTTGATTTC 
AGATTTACAA 
GATGAATGAG 
TTCCTCCTTC 
TCTTGAATTC 
GTACTTGTTG 
TGGTGGTGGT 
ACTTCACTGC 
TACAGGTATG 
CATGTTGGTC 
CAAAGTGCTG 
GCTTCTAGAG 
TATTTTTTTT 
TAGTCTGACT 
ATTGAATTAG 
ACTCTTCCAA 
ATAAACTGTT 
TTCCATGTCA 
GGTTATTATT 
AAGTGCTAAG 
AAATATGTTG 
CATAATAGGC 
GAAAAAAGAA 
ATCTCCTCTC 
GCTCTTGCCT 
AGTTTGAGAC 
CAAAATTAGG 
CAGGAAGATC 
AAAATACAAA 
TAAGGCAGAG 
TTCCACTCCA 
AAGGTAACCT 
CATTTGCCTT 
CCCCGCTGCC 
TGAAAATGCC 
CAGCACCATC 



AAAATTGAGG 
ACTGAGAACT 
T G A A AT A A A A 
TTAAAATTCT 
GAGGGAAAGT 
TCA7GGCAAC 
CTGTGTGGTA 
TTAAAACTCA 
TAGTAAGAGC 
ATAGTTAGCT 
GTTACGGGAT 
GGGAAGTTCT 
AAGAAAACCA 
TAAAACAGTG 
GAACCTACAG 
AGGTTTCACT 
TACCATTGGA 
CCCACCACCA 
AGTTTGTAGT 
ACCATCTGCT 
CCAAATTCAA 
GAGCAGCCCC 
AACTGAACTA 
GTAACCACTC 
TTCTGTTTGA 
ATTACCTTAA 
TCTGACCTCT 
GGTGTTGACA 
AACCACAGCC 
TGCCACCACA 
AGCCTGCTCT 
GGATTACAGG 
TGAGTTTTTA 
TTCAGGAAAA 
CTTCTTTCCA 
GCAACGGTAT 
ATAGTTTATC 
AACACTATTG 
TTATTATGTA 
GTGCTTTTTA 
GGATGTATTG 
AAACCAAGTT 
TGTGTGGAGA 
AGAAAGCAAC 
AAACTTTTAC 
GTAGTCCCAG 
CAGCCTGACC 
CCGGGCACAG 
ACCTGAGGTC 
ATTAGCCGGG 
AATCACTTGA 
GCCTGGGCAA 
TGAACTATGT 
ATTTATGGTA 
TAGCCTTTGT 
TGCTTCTCTT 
AAACAGAAAG 



AGAGGAGGAA 
A AC A A G A AG A 
ACAAACTACT 
GTTTTCAGGC 
TTAAGATGGA 
AAGGTTTATC 
AGCTCTTTCC 
GATGAAAGCT 
TATTTAAGAA 
AGGAATGAAA 
GAAAGATGAG 
GGCCTGGCAG 
AGTAGAAGAG 
CTGTTGGAGA 
TATTTATCAG 
TTGGCGCAGG 
ACTATGTATG 
GTAACTACTT 
CACTGTCTTT 
GCCTCCACTT 
TGAACAAGTT 
TCCTGAAACC 
ATTATTTTAC 
AACAAATTGC 
CCCACAACAG 
TTTATAGCCT 
GTCCAATCTT 
GAGTTTCGCT 
TCCTGGGTTT 
CCCAGCTAAT 
CAAACTCCTC 
CATGAGCCAC 
AAACACAAAT 
AACAGTTCAA 
AGCTTTCATC 
AAAAATTATA 
TAGAATGACA 
CCACATGCAA 
AATTAGACTT 
AACATAATTT 
TTACTGCTGT 
TATATGCAAG 
CGGCAGGCTA 
AATATGCTTT 
GTTTTTTCCA 
CCTTTGGGAG 
AAAAATGGAG 
TGGCTTACCC 
AGGAGTTCGA 
CGTGGTGGTG 
ACCCAGGCAG 
CAAGAGCGAA 
GAGATCTTTA 
AAAATGTTGA 
TCACATTGCT 
TCAAGGTAGC 
AATGAATCTC 



9720 
9760 
9840 
9900 
9960 
10020 
10080 
10X40 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10620 
10680 
10740 
10800 
10860 
10920 
10980 
11040 
11100 
11160 
11220 
11280 
11340 
11400 
11460 
11520 
11580 
11640 
11700 
11760 
11820 
11880 
11940 
12000 
12060 
12120 
12180 
12240 
12300 
12360 
12420 
12480 
12540 
12600 
12660 
12720 
12780 
12340 
12900 
12960 
13020 
13080 
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TTGTAAATTC AGCTCTTACG TCATTCATTA CATTATTTTG TAACTCTTTA TAGATTCTTC 13140 

TCTCCCACTA GACTCTGAGT CACTGGAGAG TAGGAGCCAA CTCTCATTCA TGTGTGGTTT 13200 

5 GGTCAGCTAC TGGCCACATT CCTGATGCAT AGTTAATGCT CAAACCTTAA CTGGTGAATC 13250 

AGCTCAAATA TTGTCCTTCT CTAAATCCAT TCACTCATTG ACTAACTATG TACTCAAAAT '13320 

AGTAAACACC A3TAATTTAA TCCAATTCCT GCCCATACTC CTTCGTACAT TTCAGGTGAA 13330 

TTAGTTTGAT AAATATGTGT GTATTACATA ATATTAAAGT ATGTACAGM GATCATGCTA 134 40 

ATCATAA?TC ACAACTGATA ACTAATCAAA CATAAATGCT CTCAGGTTAA CAAATGTCTG 13500 

CCTTCTCAGT TAATGCAGTC ATTAACAAAC ACCTTCTGAT GCTGATAATA GGGCCTTGTT 13560 

10 CAGCAATGAA GCCATAAAGG TGAATAAAGA ACATGCCCTC GTGGAGCTCA CAGCCTAGTC 13620 

ATTATTGTTC TGATTTTTAA TATTAATGTT GGTTTGGGTT TTGGTGAAAA ATGTTTAGAC 13630 

TTATCTTAGT GATCTTTTCA TCCTTTGCTA TATTATTTTT CTCTAAGAGT CTTCCTTATC 13740 

CCCTCCTTTA AAAAACTAGG TGATAATTCT AAATTG'TAAA TTTAAATATT ATAAATAGCT 13800 

TATAAAATTT AATATTTATA ATATTTAAAT GTTTGATAAA TATTTAAATT TTATAATATT 13860 

TAAATGTTTA TTTAAATTCA TTTGTACATC AGTTTTTATT TTATTTAAAT GTGTTGGCCA 13920 

75 GGCATGGTGG CTGACACCTA TAATCCCAGA ACTTTGAGAG GCCAAGTCAG GCAAACCATT 13980 

TGAGCTCAGG AGTTTGAGAC CACCCTGGGC AACGTGGTGA AACCC7GTCT CTACCAAACA 14040 

TATGAAAACT TATCTGGGTG TGGTGGCACG CATCTGTGGT CCCAGATGGG AGTCCCAGGC 14100 

TAAGATGGGA GAATCGCTTG AACCCAGGTG AGAGGGGTGG GGTGGATGTT GCAGTGAGCT 14160 

GAGATCGTGC CACTGCACTC CAACCTGGGT GACAGAGTGA GACTCCATCT CAAAAAAAAA 14220 

AAATGTTATC TAAATAAGAT AAATTTAATA ACTGTTCGCA CTTACATGAG CATAAGGAAC 14280 

20 TAAACCTAGA TAAAACTATC AAATAAGGCC TGGGTACAGT GACTCATGCC TGTAATCTCA 14340 

AGCACTTTGG GAGGCCAAAA TTATACAAAG TTAGTTGTAT AACACCAACT AACAACTATT 14400 

TTGCCGTTAC CTTAATTCAG ATTAATTTTT TTTAAACTGA GTTTTAAATT CCTGCTTACT 144 60 

CTACCATACA TGCTAGGCCT CATATTATGC TAGAAAAATT TTGAGCACAG ATTTATGAAT 14520 

ACTCTCCTGC ATACCATTTA ATTTTTAAAC AAATTTTAAT GCAGTATATA TGTGCCTTTT 14 530 

25 TACCAACACA TTAAATAATA AGATCTACTG TGAGGACTAA ATTTCTGT AA TTTCAAAGTA 14640 

GTAATGAGTT TAAACCATGT CTCAAGATCT CTGCAATAAC TGTAGCACAA CAGAAAATAG 14700 

GTATTTCTAT TAATGACAGA GTCACAAGTA CT ACT A AT A A TACTGTGGTT TGTTTCCTGC 14760 

AACTAATCA? GGGAGGAATG CTAAATTTCA GAGGTTGGTG AAAATACATG TGTATTTTTT 14820 

TCCCCATCCA AGTTCACAGA TTTCTCACAC TGAGAACTCC TATTCCATAA CAAAATTCTG 14880 

GAAGCCTGCA CACCGTATTG GAAGAAGGGC AGAAAGGAAA AGCAAATGGA AGGATTTAAA 14940 

30 TTTTTTTCAA ATCCTGTATC CCTTGATTTT ACACCAAGAT TGTATTTATG TATTACTTGT 15000 

GTTAAAAATA TAGTATAATC GAGACTCCAG ATCAAAAATC ACCGCAGCTC AGGGAGAAAG 15060 

AGGGCCACCA AATGCCAGAG CCCTTCAGCC TTCTCCCACC CTGCCTGTAC CCTCAGATGG 15120 

AAGCACTTTT TTATCATTGT TTCACCTTTA GCATTTTGAC AATGAAGTCA CAAACCTTCA 15130 

GCCTCTCACC CATAGGAACC CACTGGTTGT AAGAGAAGGA TGAAGCCAGT CCTTCCTAAA 15240 

GGGCACGATT AGATGTGTTT ATGGCATCCT CAGGTGAAAC TATATTTATA TTGACAATAT 15300 

35 ATTTATATTT CTCAAGGAAT ACTAGAATAA TGATTCAGTT CAGTACTAGG CCATTTATCT 15360 

ACCCTTTATA ATATTGTTTA ATGAGAAAAT GCTTTCTATC TTCCAAATAT CTGATGATTT 15420 

GTAAGAGAAC ACTTAAACAT GGGTATTCAT AAGCTGAAAC TTCTGGCATT TATTGAATCT 15480 

CAAGATTGTT CATCAGTATA CTAGGTGATT AACTGACCAC TGAACTTGAA GGTAGTATAA 15540 

AGTAGTAGTA AAAGGTACAA TCATTGTCTC TTAACAGATG GCTCTTTGCT TTCATTAGGA 15600 

ATAAAG ATG GCT GCT GAA CCA GTA GAA GAC AAT TGC ATC A AC TTT GTG GCA 15651 

40 Met Ala Ala Glu Pro Val Glu Asp Asn Cys He Asn Phe Val Ala 

-35 -30 -25 

ATG AAA TTT ATT GAC AAT ACG CTT TAC TTT ATA G GTAAGGC TAATGCCATA 15702 
Met Lys Phe He Asp Asn Thr Leu Tyr Phe He Ala 
-20 -15 -10 

GAACAAATAC CAGGTTCAGA TAAATCTATT CAATTAGAAA AGATGTTGTG AGGTGAACTA 15762 

45 TTAAGTGACT CTTTGTGTCA CCAAATTTCA CTGTAATATT AATGGCTCTT AAAAAAATAG 15822 

TGGACCTCTA GAAATTAACC ACAACATGTC CAAGGTCTCA GCACCTTGTC ACACCACGTG 15882 

TCCTGGCACT TTAATCAGCA GTAGCTCACT CTCCAGTTGG CAGTAAGTGC ACATCATGAA 15942 

AATCCCAGTT TTCATGGGAA AATCCCAGTT TTCATTGGAT TTCCATGGGA AAAATCCCAG 16002 

TACAAAACTG GGTGCATTCA GGAAATACAA TTTCCCAAAG CAAATTGGCA AATTATGTAA 16062 

GAGATTCTCT AAATTTAGAG TTCCGTGAAT TACACCATTT TATGTAAATA TGTTTGACAA 16122 

at GTAAAAATTG ATTCTTTTTT TTTTTTTCTG TTGCCCAGGC TGGAGTGCAG TGGCACAATC 16182 

TCTGCTCACT GCAACCTCCA CCTCCTGGGT TCAAGCAATT CTCCTGCCTC AGCCTTCTGA 16242 
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GTAGCTGGGA 
AGACAGGGTT 
CTGGCTCGGG 
ATTGATTCTT 
TTTGAAACCT 
GCAAAATATC 
AGCCTAGGAA 
ACAGTAGACC 
TTTCTTAGGT 
AATTTATTTT 
TTACCTGAGA 
TAATATTCTG 
AGGAGTAGCA 
ACATATTCTC 



AGAAATGAAT 
ATTGTCAGCT 
GTGCACTCAG 
CTAGCAAAAG 
AGACTTTGCC 
GGAGCGTTCC 
CCTCTCAGAA 
AGAAAATTCT 
CCAAAAGCTG 
AGAGTAGGAG 
CGTTCTCTCA 
AGCACAATAT 
ACAGTACCTG 
AAGTACTATG 
ACCCA7CATC 
GTTCATATCC 
TCTAATTGGA 
ATGGCCACAC 
GACTCTTTAA 
AATTGATGGT 
TTACAAGAAA 
GTGAATGGAT 
GAAAGGAGAA 
TTCAGTGAAA 
GGATGACGCA 
CAAACTCTTC 
TTGTAGGCTT 
GAAATGTGTA 
AAAGGATGCA 
AGAAAAACCA 
TGTTGTGTTG 
CAACAAGGAG 
GATTGCAATG 
AAAGCTTGGC 
ATGGAGCAGT 
GTGTTAATTG 
AAAAACCACC 
TATGTATATA 
TAAGCTGCTT 
CAGTTATATT 
TCAATTCTTA 



CTACAGGTGC 
TTGGCATGTT 
CTCCCAAAGT 
ATGATTAATC 
TCATTTAAAA 
CTGTGGACAC 
TTTGAGCCTG 
CTGACACACA 
GACTTTCCGT 
TAGTTACATA 
ACACACTAAG 
ATGAAAGCCA 
AAAGTAAAAG 
TTTCTCTCTT 



TTATTTTTCT 
GAGGAAAAAA 
TAGCACAGCT 
ATGCTTCTCT 
TGTTTCATTG 
ATAGGAAAAA 
ATGCTTTGGG 
GGAAGTAGAG 
AAAGAAACCA 
TAGGAGACTG 
CCCCAAGAT3 
GTATTAGCTA 
AATAAGATAG 
GCTGGAAGCA 
CCCTAGTTGT 
CAGTTATCAA 
AGTTAAACAC 
CAAGTGCAAG 
AATTCAGAAA 
GTGGGGTGAG 
TGATGGTGTC 
TTTAGAAACA 
GAAAGTAGAA 
TAGGGAACAC 
TTTCGTTTTG 
TACATGTGGT 
ATACATAGAA 
AAGTGAGAGA 
GTAGAAAGAA 
AGAGAATTCC 
AATTTTGCAG 
TTTGGTGATC 
AGTGAAACAG 
TGTTAAAAGG 
TTTAAATCTC 
TAACTAATTG 
CAGAAATAAT 
TACAGACACA 
TTTCTAGTTA 
AAGTTCATGA 
ATTGGTGAAT 



ATCCCGCCAT 
GTCCAGGCTG 
GCTGGGATTA 
TCCTGTGAAC 
GCCTGAGCAA 
CTCCTACCTT 
CAGTGAGCTA 
CACACAAAAA 
TTAAGCAATA 
TTGAAATTTT 
TCTGATAAGC 
AGACAGACCC 
CTAGAATGAG 
TTTCCCCCTC 



TTGCAAACTA 
AAAAATGGTT 
TTGGAATGAA 
ATGCCTTAAA 
GTCCTAAGAT 
GGGATTGAAG 
AAGAAGCCTG 
GAGATAGGAA 
TGGCATTTAT 
GTGAGAGGAG 
TGAAATTTGG 
GGGTAAAGAT 
AGAATTTTTC 
ACCTGATGAT 
TGATCTCACT 
GAAAGGGTCA 
ATCAATCCCC 
GAAATCTGGA 
TAATATATTT 
GAGGCCAAAA 
ATGAATTAAG 
CTTGAGAGAA 
AAGATGATGC 
AGGAGGAAGA 
GATCTGAGAT 
TCTGAGTTCA 
ATGGCATTTG 
GGAAAAGCCA 
GCTAATAAAC 
ACCGACTCCC 
CCTTGAGAAT 
TCAGTGAAAG 
TGAATGGGAA 
AGGAGAGAAA 
AAAATAAAGA 
AGGCAATGAA 
GATAGCTACC 
GAAATGCTTA 
GTGATATATA 
TATTTCACAA 
GTTTGTTTCC 



TGATCCCACC 
AAAACCTTCA 



CTAGAGTGA1 
ACTTTATCTH 



GCCTGGCTAA TTTTTGGGTA TTTTTACTAG 

GTCTTGGACT CCTGATCTCA GATGATCCTC 

CAGGCATGAA CCACCACACA TGGCCTAAAA 

AATTTGGCTT CATTTGAAAG TTTGCCTTCA 

CAAAGTGAGA CCCCATCTCT AC A AAA A ACT 

CTGTGGAGGC TGAAGCAGGA GGATCACTTG 

CCTACACTCC AGCCTGCATG 

TAAAAAATTA TTAGTTGACT 

AATTTAAAAG TAAAATCTCT AATTTTAGAA 

TAAACCCTAG GT7TAAGTTT TATGTCTAAA 

TTCATTTTAT GGGCCTTTTG GATGATTATA 

TTAAACCATA AAAATAGGAG TTCGAGAAAG 

ATTCAATTCT GAGTCGAAAT ACAAAATTTT 

TTAG CT CAA GAT GAT G GTAAAGT 

Ala Glu A3p Asp Glu 
-10 

AGTATCTGCT TGAGACACAT CTATCTCACC 

CTCATGCTAC CAATCTGCCT TCAAAGAAAT 

GATGATCATA AGAGATACAA AGAAGAACCT 

AAATTCTCCA GCTCTTAGAA TCTACAAAAT 

TAGCATGAAG CCATGGATTC TGTTGTAGGG 

CATTAGAATT GTCCAAAATC AGTAACACCT 

GAAGGTTCCG GGTTGGTGGT GGGGTGGGGC 

TGGG7GGGGC AAGAAGACCA CATTCAGAGG 

GATGAATTCA GGGTAATTCA GAATGGAAGT 

AAACAGGGTG TAGAGCAAGA 

GGAGATAATA GGGTTAATTA 

TAGTTTGTTG TAACAAAGAC ATCCAAAGAT 

TCTCAAAGAA AGTCTAAGTA GGCAGCTCAG 

ATTGGGACCC CCAACCTTCT TCAGTCTTGT 

CACATAGTTG AAAATCATCA TACTTCCTGG 

AGAGAAGTCA GGCTCATTCC TTTCAAAGAC 

CTCATATTCC ATTGACTAGA ATTTAATCAC 

AAATATAATC TTTATTCCAG GTAGCCATAT 

TTAAAATATC ATTCTGGCTT TGGTATAAAG 

TTAAGGGTTG AGAGCCTATT ATTTTAGTTA 

GTAGACATAG GGGAGTGCTG ATGAGGAGCT 

TCAATAGGAC ATGATTTAGG GTTGGATTTG 

CTACATTTTT CACTTAGGCA ATTTGTACCA 

GCAGGTTTTG GTGTATACAA AGAGGAGGAT 

GTCTGTGGAA CGTCCTAGTG GAGATGTCCA 

GGACACAGAT TTGGGCTGGA GATAGAGATA 

AATCTATAGA GATAAAAAGA CACATCAGAG 

AGTACTGTGC TGGGGGGAAT ACCTACATTT 

AACAGAGAGC AGACTAACCA AAAGGGGAGA 

AGGAGAGCAT TTCAAGATTG AGGGGATAGG 

CAAGGGCCAG AACACAGCTT TTAGATTTAG 

CAGCTTGATG GTGAAATGGA GGCAGAGGCA 

GTGAAGAAAT GATACAGATA ATTCTTGCTA 

CAAGACTAGC TGCAAAGTGA GATTGGGTTG 

GCTTTGTGCT TTTTTGATTA TGAAAATAAT 

AAAAGATAAT AATATGAAAG ATAAAAATAT 

ATTTTGATAC AATATTTCTA CACTCCTTTC 

TATTTTTATT AAAAGGGATT GTACTATACC 

TGGACATCTC TCCATGGCAA CGAGTAATTG 

TAAGGGCATA TCTTTGCCCT TTTTATTTAA 

AGTTTGTTGT TGTTATTAAC AATGTTCCCA 



16302 
16362 
16422 
16432 
16542 
16602 
1 6662 
16722 
16782 
16842 
16902 
16962 
17C22 
17075 



17135 
17195 
17255 
17315 
17375 
17435 
17495 
17555 
17615 
17575 
17735 
17795 
17955 
i79l5 
17975 
18035 
18095 
18155 
18215 
18275 
18335 
18395 
19455 
18515 
18575 
18635 
18695 
18755 
18815 
18875 
18935 
18995 
19055 
19115 
19175 
19235 
19295 
19355 
19415 
19475 
19535 
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TAAGCATTCC TGTACACCAA TGTTCACACA TTTGTCTGAT TTTTTCTTCA GGATAAAACC 19595 

CAGGAGGTAG AATTGCTGGG TTGATAGAAG AGAAAGGATG ATTGCCAAAT TAAAGCTTCA 19655 

GTAGAGGGTA CATGCCGAGC ACAAATGGGA TCAGCCCTAG ATACCAGAAA TGGCACTTTC 19715 

5 TCATTTCCCC TTGGGACAAA AGGGAGAGAG GCAATAACTG TGCTGCCAGA GTTAAATTTG 19775 

TACGTGGACT AGCAGGAAAT CATTTGCTGA AAATGAAAAC AGAGATGATG TTGTAGAGGT 19 835 

CCTGAAGAGA GCAAAGAAAA TTTGAAATTG CGGCTATCAG CTATGGAAGA GAGTGCTGAA 19895 

CTGGAAAACA AAAGAAGTAT TGACAATTGG TATGCTTGTA ATGGCACCGA TTTGAACGCT 19955 

TGTGCCATTG TTCACCAGCA GCACTCAGCA GCCAAGTTTG GAGTTTTGTA GCAGAAAGAC 20015 

AAATAAGTTA GGGATTTAAT ATCCTGGCCA AATGGTAGAC AAAATGAACT CTGAGATCCA 20075 

10 GCTGCACAGG GAAGGAAGGG AAGACGGGAA GAGGTTAGAT AGGAAATACA AGAGTCAGGA 20135 

GACTGGAAGA TGTTGTGATA TTTAAGAACA CATAGAGTTG GAGTAAAAGT GTAAGAAAAC 20195 

TAGAAGGGTA AGAGACCGGT CAGAAAGTAG GCTATTTGAA GTTAACACTT CAGAGGCAGA 20255 

GTAGTTCTGA ATGGTAACAA GAAATTGAGT GTGCCTTTGA GAGTAGGTTA AAAAACAATA 20315 

GGCAACTTTA TTGTAGCTAC TTCTGGAACA GAAGATTGTC ATTAATAGTT TTAGAAAACT 20375 

AAAATATATA GCATACTTAT TTGTCAATTA ACAAAGAAAC TATGTATTTT TAAATGAGAT 20435 

TTAATGTTTA TTGTAG AA AAC CTG GAA TCA GAT TAC TTT GGC AAG CTT GAA 20486 
Glu Asn Leu Glu Ser Asp Tyr Phe Gly Lys Leu Glu 
-5 15 

TCT AAA TTA TCA GTC ATA AGA AAT TTG AAT GAC CAA GTT CTC TTC ATT 20534 
Ser Lys Leu Ser Val He Arg Asn Leu Asn Asp Gin Val Leu Phe He 
20 10 15 ~ 20 

GAC CAA GGA AAT CGG CCT CTA TTT GAA GAT ATG ACT GAT TCT GAC TGT 20582 
Asp Gin Gly Asn Arg Pro Leu Phe Glu Asp Met Thr Asp Ser Asp Cys 

25 30 35 

AGA G GT ATTTTTTTTA ATTCGCAAAC ATAGAAATGA CTAGCTACTT CTTCCCATTC 20638 
Arg Asp 

25 40 

TGTTTTACTG CTTACATTGT TCCGTGCTAG TCCCAATCCT CAGATGAAAA GTCACAGGAG 20698 

TGACAATAAT TTCACTTACA GGAAACTTTA TAAGGCATCC ACGTTTTTTA GTTGGGGTAA 20758 

AAAATTGGAT ACAATAAGAC ATTGCTAGGG GTCATGCCTC TCTGAGCCTG CCTTTGAATC 20818 

ACCAATCCCT TTATTGTGAT TGCATTAACT GTTTAAAACC TCTATAGTTG GATGCTTAAT 20878 

CCCTGCTTGT TACAGCTGAA AATGCTGATA GTTTACCAGG TGTGGTGGCA TCTATCTGTA 20938 

ATCCTAGCTA CTTGGGAGGC TCAAGCAGGA GGATTGCTTG AGGCCAGG AC TTTGAGGCTG 20998 

TAGTACACTG TGATCGTACC TGTGAATAGC CACTGCACTC CAGCCTGGGT GATATACAGA 21058 

CCTTGTCTCT AAAATTAAAA AAAAAAAAAA AAAAAACCTT AGGAAAGGAA ATTGATCAAG 21118 

TCTACTGTGC CTTCCAAAAC ATGAATTCCA AATATCAAAG TTAGGCTGAG TTGAAGCAGT 21178 

GAATGTGCAT TCTTTAAAAA TACTGAATAC TTACCTTAAC ATATATTTTA AATATTTTAT 21238 

3S TTAGCATTTA AAAGTTAAAA ACAATCTTTT AGAATTCATA TCTTTAAAAT ACTCAAAAAA 21298 

GTTGCAGCGT GTGTGTTGTA AT AC AC ATT A AACTGTGGGG TTGTTTGTTT GTTTGAGATG 21358 

CAGTTTCACT CTGTCACCCA GGCTGAAGTG CAGTGCAGTG CAGTGGTGTG ATCTCGGCTC 21418 

ACTACAACCT CCACCTCCCA CGTTCAAGCG ATTCTCATGC CTCAGTCTCC CGAGTAGGTG 21478 

GGATTACAGG CATGCACCAC TTACACCCGG CTAATTTTTG TATTTTTAGT AGAGCTGGGG 21538 

TTTCACCATG TTGGCCAGGC TGGTCTCAAA CCCCTAACCT CAAGTGATCT GCCTGCCTCA 21598 

40 GCCTCCCAAA CAAACAAACA ACCCCACAGT TTAATATGTG TTACAACACA CATGCTGCAA 21658 

CTTTTATGAG TATTTTAATG ATATAGATTA TAAAAGGTTG TTTTTAACTT TTAAATGCTG 21718 

GGATTACAGG CATGAGCCAC TGTGCCAGGC CTGAACTGTG TTTTTAAAAA TGTCTGACCA 21778 

GCTGTACATA GTCTCCTGCA GACTGGCCAA GTCTCAAAGT GGGAACAGGT GTATTAAGGA 21838 

CTATCCTTTG GTTAAATTTC CGCAAATGTT CCTGTGCAAG AATTCTTCTA ACTAGAGTTC 21898 

4$ TCATTTATTA TATTTATTTC AG AT AAT GCA CCC CGG ACC ATA TTT ATT ATA 21949 

Asp Asn Ala Pro Arg Thr He Phe He He 
40 45 

AGT ATG TAT AAA GAT AGC CAG CCT AGA GGT ATG GCT GTA ACT ATC TCT 21997 
Ser Met Tyr Lys Asp Ser Gin Pro Arg Gly Met Ala Val Thr He Ser 
50 55 60 65 

SO GTG AAG TGT GAG AAA ATT TCA ACT CTC TCC TGT GAG AAC AAA ATT ATT 22045 
Val Lys Cys Glu Lys He Ser Thr Leu Ser Cys Glu Asn Lys He He 
70 75 80 
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TCC TTT AAG GTAAGACTG AGCCTTACTT TGTTTTCAAT CATGTTAATA TAATCAATAT 22103 
Ser Phe Lys 

AATTAGAAAT ATAACATTAT TTCTAATGTT AATATAAGTA ATGTAATTAG AAAACTCAAA 22163 

TATCCTCAGA CCAACCTTTT GTCTAGAACA GAAA7AACAA GAAGCAGAGA ACCATTAAAG 22223 

TCAATACTTA CTAAAAATTA TCAAACTCTT TACCTATTGT GATAAT3ATG GTTTTTCTGA 22283 

GCCTGTZACA GGGGAAGAGG AGATACAACA CTTGTTTTAT GACCTGCATC TCCTCAACAA 22323 

TCAGTCTTTA TACAAATAAT AATGTAGAAT ACATATGTGA GTTATACATT TAAGAATAAC 22403 

ATGTGACTTT CCAGAATGAG TTCTGCTATG A AG A A TG A AG CTAATTATCC TTCTATATTT 22463 

CTACACCTTT GTAAATTATG ATAATATTTT AATCCCTAGT TGTTTTGTTG CTGATCCTTA 22 523 

GCCTAAGTCT TAGACACAAG CTTCAGCTTC CAGTTGATGT ATGTTATTTT TAATGTTAAT 22533 

CTAATTGAAT AAAAGTTATG AGATCAGCTG TAAAAGTAAT GCTATAATTA TCTTCAAGCC 22643 

AGGTATAAAG TATTTCTGGC CTCTACTTTT TCTCTATTAT TCTCCATTAT TATTCTCTAT 22703 

TATTTTTCTC TATTTCCTCC ATTATTGTTA GATAAACCAC AATTAACTAT AGCTACAGAC 22753 

TGAGCCAGTA AGAGTAGCCA. GGGATGCTTA CAAATTGGCA ATGCTTCAGA GGAGAATTCC 22823 

ATGTCATGAA GACTCTTTTT GAGTGGAGAT TTGCCAATAA ATATCCGCTT TCATGCCCAC 22833 

CCAGTCCCCA CTGAAAGACA GTTAGGATAT GACCTTAGTG AAGGTACCAA GGGGCAACTT 22943 

GGTAGGGAGA AAAAAGCCAC TCTAAAATAT AATCCAAGTA AGAACAGTGC ATATGCAACA 23003 

GATACAGCCC CCAGACAAAT CCCTCAGCTA TCTCCCTCCA ACCAGAGTGC CACCCCTTCA 23053 

GGTGACAATT TGGAGTCCCC ATTCTAGACC TGACAGGCAG CTTAGTTATC AAAATAGCAT 23123 

AAGAGGCCTG GGATGGAAGG GTAGGGTGGA AAGGGTTAAG CATGCTGTTA CTGAACAACA 23133 

TAATTAGAAG GGAAGGAGAT GGCCAAGCTC AAGCTATGTG GGATAGAGGA AAACTCAGCT 23243 

GCAGAGGCAG ATTCAGAAAC TGGGATAAGT CCGAACCTAC AGGTGGATTC TTGTTGAGGG 23303 

AGACTCCTGA AAATGTTAAG AAGATGGAAA TAATGCTTGG CACTTAGTAG GAACTGGGCA 23353 

AATCCATATT TGGGGGAGCC TGAAGTTTAT TCAATTTTGA TGGCCCTTTT AAATAAAAAG 234 23 

AATGTGGCTG GGCGTGGTGG CTCACACCTG TAATCCCAGC ACTTTGGGAG GCCGAGGGGG 23433 

GCGGATCACC TGAAGTCAGG AGTTCAAGAC CAGCCTGACC AACATGGAGA AACCCCATCT 23543 

CTACTAAAAA TACAAAATTA GCTGGGCGTG GTGGCATATG CCTGTAATCC CAGCTACTCG 23603 

GGAGGCTGAG GCAGGAGAAT CTTTTCAACC CGGGAGGCAG AGGTTGCGAT GAGCCTAGAT 236 53 

CGTGCCATTG CACTCCAGCC TGGGCAACAA GAGCAAAACT CGGTCTCAAA AAAAAAAAAA 23723 

AAAAAGTGAA ATTAACCAAA GGCATTAGCT TAATAATTTA ATACTGTTTT TAAGTAGGGC 23733 

GGGGGG^GGC TCGAAGAGAT CTGTGTAAAT GAGGGAATCT GACATTTAAG CTTCATCAGC 2384 3 

ATCATAGCAA ATCTGCTTCT CGAAGGAACT CAATAAATAT TAGTTGGAGG GGGGGAGACA 23903 

GTGAGGGGTG GACTAGGACC AGTTTTAGCC CTTGTCTTTA ATCCCTTTTC CTGCCACTAA 23963 

TAAGGATCTT AGCAGTGGTT ATAAAAGTGG CCTAGGTTCT AGATAATAAG ATACAACAGG 24023 

CCAGGCACAG TGGCTCATGC CTATAATCCC AGCACTTTGG GAGGGCAAGG CGAGTGTCTC 24033 

ACTTGAGATC AGGAGTTCAA GACCAGCCTG GCCAGCATGG CGATACTCTG TCTCTACTAA 24143 

AAAAAATACA AAAATTAGCC AGGCATGGTG GCATGCACCT GTAATCCCAG CTACTCGTGA 24203 

GCCTGAGGCA GAAGAATCGC TTGAAACCAG GAGGTGTAGG CTGCAGTGAG CTGAGATCGC 24263 

ACCACTGCAC TCCAGCCTGG GCGACAGAAT GAGACTTTGT CTCAAAAAAA GAAAAAGATA 24323 

CAACAGGCTA CCCTTATGTG CTCACCTTTC ACTGTTGATT ACTAGCTATA AAGTCCTATA 24333 

AAGTTCTTTG GTCAAGAACC TTGACAACAC TAAGAGGGAT TTGCTTTGAG AGGTTACTGT 24443 

CAGAGTCTGT TTCATATATA TACATATACA TGTATATATG TATCTATATC CAGGCTTGGC 24503 

CAGGGTTCCC TCAGACTTTC CAGTGCACTT GGGAGATGTT AGGTCAATAT CAACTTTCCC 24563 

TGGATTCAGA TTCAACCCCT TCTGATGTAA AAAAAAAAAA AAAAAAGAAA GAAATCCCTT 24623 

TCCCCTTGGA GCACTCAAGT TTCACCAGGT GGGGCTTTCC AAGTTGGGGG TTCTCCAAGG 24683 

TCATTGGGAT TGCTTTCACA TCCATTTGCT ATGTACCTTC CCTATGATGG CTGGGAGTGG 24743 

TCAACATCAA AACTAGGAAA GCTACTGCCC AAGGATGTCC TTACCTCTAT TCTGAAATGT 24803 

GCAATAAGTG TGATTAAAGA GATTGCCTGT TCTACCTATC CACACTCTCG CTTTCAACTC 24863 

TAACTTTCTT TTTTTCTTTT TTTCTTTTTT TCTTTTTTTT TGAAACGGAG TCTCGCTCTG 24923 

TCGCCCAGGC TAGAGTGCAG TGGCACGATC TCAGCTCACT GCAAGCTCTG CCTCCCGGGT 24983 

TCACGCCATT CTCCTGCCTC ACCCTCCCAA GCAGCTGGGA CTACAGGCGC CTGCCACCAT 25043 

GCCCAGCTAA TTTTTTGTAT TTTTAGTAGA GACGGGGTTT CACCGTGTTA GCCAGGATGG 25103 

TCTCGATCTC CTGAACTTGT GATCCGCCCG CCTCAGCCTC CCAAAGTGCT GGGATTACAG 25163 

GCGTGAGCCA TCGCACCCGG CTCAACTGTA ACTTTCTATA CTGGTTCATC TTCCCCTGTA 25223 

ATGTTACTAG AGCTTTTGAA GTTTTGGCTA TGGATTATTT CTCATTTATA CATTAGATTT 25283 

CAGATTAGTT CCAAATTGAT GCCCACAGCT TAGGGTCTCT TCCTAAATTG TATATTGTAG 25343 

ACAGCTGCAG AAGTGGGTGC CAATAGGGGA ACTAGTTTAT ACTTTCATCA ACTTAGGACC 25403 
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10 



15 



20 



25 



30 
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45 



CACACTTGTT 

AGAAGTTGGA 

ATGAAGAAAT 

CTGTGCTGCC 

GAAGAACCTT 

TAACAGGAAA 

TAATCTAACC 

TGGCCTTCTA 

CTATGGAACT 

CAGCAGGCTT 

GTAAGTTTTA 

TCTCTCACAG 

GAATCCCAGC 

CAGCTTAGGC 

AATTAGCCAA 

GGGATTGCTT 

CTGGCTGGGC 

GCCTAAGTTT 

CAGAAATTGA 

GCAAATGTGG 

AAACTAGAGG 

TAAGAAGGAG 

GCAAAACTAC 

TGTTCATGTC 



GATAAAGAAC AAAGGTCAAG AGTTATGACT ACTGATTCCA CAACTGATTG 
GATAACCCCG TGACCTCTGC CATCCAGAGT CTTTCAGGCA TCTTTGAAGG 
GCTATTTTAA TTTTGGAGGT TTCTCTATCA GTGCTTAGGA TCATCGGAAT 
ATGAGGCCAA AATTAAGTCC AAAACATCTA CTGGTTCCAG GATTAACATG 
AGGTGGTGCC CACATGTTCT GATCCATCCT GCAAAATAGA CATGCTGCAC 
AGTGCAGGCA GCACTACCAC TTGGATAACC TGCAAGATTA TAGTTTCAAG 
ATTTCTCACA AGGCCCTATT CTGTGACTGA AACATACAAG AATCTGCATT 
AGGCAGGGCC CAGCCAAGGA GACCATATTC AGGACAGAAA TTCAAGACTA 
GGAGTGCTTG GCAGGGAAGA CAGAGTCAAG GACTGCCAAC TGAGCCAATA 
ACACAGGAAC CCAGGGCCTA GCCCTACAAC AATTATTGGG TCTATTCACT 
ATTTCAGGCT CCACTGAAAG AGTAAGCTAA GATTCCTGGC ACTTTCTGTC 
TTGGCTCAGA AATGAGAACT GGTCAGGCCA GGCATGGTGG CTTACACCTG 
ACTTTGGGAG GCCGAAGTGG GAGGGTCACT TGAGGCCAGG AGTTCAGGAC 
AACAAAGTGA GATACCCCCT GACCCCTTCT CTACAAAAAT AAATTTTAAA 
ATGTGGTGGT GTATACTTAC AGTCCCAGCT ACTCAGGAGG CTGAGGCAGG 
GAGCCCAGGA ATTCAAGGCT GCAGTGAGCT ATGATTTCAC CACTGCACTT 
AACAGAGCGA GACCCTGTCT CAAAGCAAAA AGAAAAAGAA ACTAGAACTA 
GTGGGAGGAG GTCATCATCG TCTTTAGCCG TGAATGGTTA TTATAGAGGA 
CATTAGCCCA AAAAGCTTGT GGTCTTTGCT GGAACTCTAC TTAATCTTGA 
ACACCACTCA ATGGGAGAGG AGAGAAGTAA GCTGTTTGAT GTATAGGGGA 
CCTGGAACTG AATATGCATC CCATGACAGG GAGAATAGGA GATTCGGAGT 
AGGAGGTCAG TACTGCTGTT CAGAGATTTT TTTTATGTAA CTCTTGAGAA 
TTTTGTTCTG TTTGGTAATA TACTTCAAAA CAAACTTCAT ATATTCAAAT 
CTGAAATAAT TAGGTAATGT TTTTTTCTCT ATAG GAA ATG AAT CCT 

Glu Met Asn Pro 
85 

ATA TTC TTT CAG AG A 
He Phe Phe Gin Arg 
100 

GAA TCT TCA TCA TAC 
Glu Ser Ser Ser Tyr 
120 

GAC CTT TTT AAA CTC 
Asp Leu Phe Lys Leu 
135 

TCT ATA ATG TTC ACT 
Ser He Met Phe Thr 
150 

GGGCGCAGTG GCTCACGCCT 



50 



CCT GAT AAC ATC AAG GAT ACA AAA AGT GAC ATC 
Pro Asp Atsn lie Lys Asp Thr Lys Ser Asp He 

90 95 
AGT GTC CCA GGA CAT GAT AAT AAG ATG CAA TTT 
Ser Val Pro Gly His Asp Asn Lys Met Gin Phe 
!05 no ~ H5 

GAA GGA TAC TTT CTA GCT TGT GAA AAA GAG AGA 
Glu Gly Tyr Phe Leu Ala Cys Glu Lys Glu Arg 

125 130 
ATT TTG AAA AAA GAG GAT GAA TTG GGG GAT AGA 
He Leu Lys Lys Glu Asp Glu Leu Gly Asp Arg 

140 145 
GTT CAA AAC GAA GAC T AGCTATTAAA ATTTCATGCC 
Val Gin Asn Glu Asp 
155 

GTAATCCCAG CCCTTT GGG A GGCTGAGGCG GGCAGATCAC CAGAGGTCAG GTGTTCAAGA 
CCAGCCTGAC CAACATGGTG AAACCTCATC TCTACTAAAA ATACAAAAAA TTAGCTGAGT 
GTAGTGACCC ATGCCCTCAA TCCCAGCTAC TCAAGAGGCT GAGGCAGGAG AATCACTTGC 
ACTCCGGAGG TGGAGGTTGT GGTGAGCCGA GATTGCACCA TTGCGCTCTA GCCTGGGCAA 
CAACAGCAAA ACTCCATCTC AAAAAATAAA ATAAATAAAT AAACAAATAA AAAATTCATA 
ATGTGAACTG TCTGAATTTT TATGTTTAGA AAGATTATGA GATTATTAGT CTATAATTGT 
AATGGTGAAA TAAAATAAAT ACCAGTCTTG AAAAACATCA TTAAGAAATG AATGAACTTT 
CACAAAAGCA AACAAACAGA CTTTCCCTTA TTTAAGTGAA TAAAATAAAA TAAAATAAAA 
TAATGTTTAA AAAATTCATA GTTTGAAAAC ATTCTACATT GTTAATTGGC ATATTAATTA 
TACTTAATAT AATTATTTTT AAATCTTTTG GGTTATTAGT CCTAATGACA AAAGATATTG 
ATATTTGAAC TTTCTAATTT TTAAGAATAT CGTTAAACCA TCAATATTTT TATAAGGAGG 
CCACTTCACT TGACAAATTT CTGAATTTCC TCCAAAGTCA GTATATTTTT AAAATTCAGT 
TTGATCCTGA ATCCAGCAAT ATATAAAAGG GATTATATAC TCTGGCCAAC TGACATTCAT 
CCTAGGAATG CAAAGATGGT TTAATATCCT AAAATCAATT AACATAACAT ACTATATTAA 
TAAAGTATCA AAACAGTATT CTCATCTTTT TTTCTTTTTT CACAATTCCT TGGTTACACT 
ATCATCTCAA TAGATGCAGA AAAAGCATTT GACAAAATCC AATTCATAAT AAAAATTCTC 



25463 

25523 

25583 

25643 

25703 

25763 

25823 

25883 

25943 

26003 

26063 

26123 

26183 

26243 

26303 

26363 

26423 

26483 

26543 

26603 

26663 

26723 

26783 

26839 



26897 
26935 
26983 
27031 
27087 



27147 
27207 
27267 
27327 
27387 
27447 
27507 
27567 
27627 
27687 
27747 
27807 
27867 
27927 
27987 
28047 
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w 



15 



AAACTTGAAA 
AACGATGAAA 
TTGCAACTTC 
AAAATAAAAG 
TGCAGAAAAC 
GGTTGCAATA 
CCTTTCGTCC 
AGGCTATAGT 
AAGTGAGACC 
GAATGATCTG 
AAATTTAGCA 
ATTAAAGATG 
CAATATTGTT 
CAAAATTCCA 
TGAAAAGACC 
TAATTTTAAA 



GAGAACATCA 
AACTGAATTA 
TATTCAACAT 
GCACCCAGAT 
CGTCAGGAAT 
TCAATATGCA 
CAGCTACTTG 
GCAATGTGAT 
CCGTCTCCAA 
AAAACAAGAA 
AAATAATTAT 
ATCTAAATAA 
AAGATAACAA 
GCAGGGTTTT 
CAGAAGAGCA 
ACTTACTATA 



TAAAGGCATC 
TTTTACCCTA 
TGTACTGGAG 
TAGAAAGGAA 
ACACACACAT 
AAAATACATT 
GGAGGCTGAG 
CTTGCCTGTG 
AAAAAAAAAT 
AATTCCATTC 
AAAACTTGTA 
TTGGAGAGAC 
TTGTCCCCAA 
TGCAGAAATT 
AATAATTTTT 
AACCTAAAGT 



TATGAAAAAC 
AGATCAAGAA 
GTTCTAGCCA 
GTCTTTATTT 
GTTAGAACTA 
GAAGGCTGGG 
GTAGGAGGAT 
AATAGCCACT 
GGTATATTGG 
ACGATGGTAT 
CATCGAAAAT 
ACTCTATGAT 
ATTGATGCAT 
GACAAGCTGT 
TAAAAACAAA 
TATCAAGACC 



CTACAGCTAA 
TAATGCAAGC 
GAGCAACCAT 
GCAGACAACA 
ATAAGTTCAG 
CTCAGTGGAG 
CACTTGAGGT 
GCACTCGAGC 
TATTTCTGTA 
TAAAAAAATA 
TTCAAAGCAC 
CACTGATTGG 
GCATTCAATT 
ACCCAAAATG 
GTTGGAAAAC 
ATTTAGT 



TATCATACTT 
ATGTCAGCTC 
ACAATAAATA 
TGGTTCTTTA 
CAAGGTTGCA 
ATGGCATGTA 
GAGGAGTTTG 
CTAGGCAACA 
TATGAACAAT 
AAATACAAAT 
TCTGAGGGAA 
AAAATTCATT 
TAGTCTTCAT 
TATATGGAAA 
TTTTACTTCC 
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14. The genomic DNA of claim 1 , which is derived from human. 

20 

15. The genomic DNA of claim 1, which is inserted into an autonomously replicable vector 

16. A transformant derived from a mammalian cell, which contains the genomic DNA claim 1 . 

25 1 7. The transformant of claim 1 6, which is derived from a cell selected from the group consisting of epithelial, interstitial 
and hemopoietic cells from mammal. 

18. A process for preparing a polypeptide, which comprises (a) artificially expressing the DNA of claim 1, and (b) 
collecting a polypeptide capable of inducing the production of interferon^ by immunocompetent cells from the 

30 resultant mixture. 

19. The process of claim 18, wherein the artificial expression of the step (a) comprises a step of culturing the trans- 
formant of claim 16. 

35 20. The process of claim 18, wherein the resultant mixture of the step (b) contains a culture of the transformant of 
claim 16. 
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21. The process of claim 18, wherein the polypeptide is collected by one or more techniques selected from the group 
consisting of salting out, dialysis, filtration, concentration, separatory sedimentation, ion-exchange chromatogra- 
phy, gel filtration chromatography, adsorption chromatography, isoelectricpoint chromatography, hydrophobic chro- 
matography, reversed phase chromatography, affinity chromatography, gel electrophoresis and isoelectric focus- 
ing. 

22. The process of claim 18, wherein the polypeptide is collected by an immunoaffinity chromatography with a mon- 
oclonal antibody. 
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